class Grell::Page

This class contains the logic related to work with each page we crawl. It is also the interface we use To access the information of each page. This information comes from result private classes below.

Constants

WAIT_INTERVAL
WAIT_TIME

Attributes

id[R]
parent_id[R]
rawpage[R]
timestamp[R]
url[R]

Public Class Methods

new( url, id, parent_id) click to toggle source
# File lib/grell/page.rb, line 18
def initialize( url, id, parent_id)
  @rawpage = RawPage.new
  @url = url
  @id = id
  @parent_id = parent_id
  @timestamp = nil
  @times_visited = 0
  @result_page = UnvisitedPage.new
end

Public Instance Methods

current_url() click to toggle source

The current URL, this may be different from the URL we asked for if there was some redirect

# File lib/grell/page.rb, line 51
def current_url
  @rawpage.current_url
end
error?() click to toggle source

True if there page responded with an error

# File lib/grell/page.rb, line 61
def error?
  !!(status.to_s =~ /[4|5]\d\d/)
end
followed_redirects?() click to toggle source

True if we followed a redirect to get the current contents

# File lib/grell/page.rb, line 56
def followed_redirects?
  current_url != @url
end
navigate() click to toggle source
path() click to toggle source

Extracts the path (e.g. /actions/test_action) from the URL

# File lib/grell/page.rb, line 66
def path
  URI.parse(@url).path
rescue URI::InvalidURIError # Invalid URLs will be added and caught when we try to navigate to them
  @url
end
retries() click to toggle source

Number of times we have retried the current page

# File lib/grell/page.rb, line 46
def retries
  [@times_visited - 1, 0].max
end
unavailable_page(status, exception) click to toggle source
# File lib/grell/page.rb, line 72
def unavailable_page(status, exception)
  Grell.logger.warn "The page with the URL #{@url} was not available. Exception #{exception}"
  @result_page = ErroredPage.new(status, exception)
  @timestamp = Time.now
end