class Grell::Page
This class contains the logic related to work with each page we crawl. It is also the interface we use To access the information of each page. This information comes from result private classes below.
Constants
- WAIT_INTERVAL
- WAIT_TIME
Attributes
id[R]
parent_id[R]
rawpage[R]
timestamp[R]
url[R]
Public Class Methods
new( url, id, parent_id)
click to toggle source
# File lib/grell/page.rb, line 18 def initialize( url, id, parent_id) @rawpage = RawPage.new @url = url @id = id @parent_id = parent_id @timestamp = nil @times_visited = 0 @result_page = UnvisitedPage.new end
Public Instance Methods
current_url()
click to toggle source
The current URL, this may be different from the URL we asked for if there was some redirect
# File lib/grell/page.rb, line 51 def current_url @rawpage.current_url end
error?()
click to toggle source
True if there page responded with an error
# File lib/grell/page.rb, line 61 def error? !!(status.to_s =~ /[4|5]\d\d/) end
followed_redirects?()
click to toggle source
True if we followed a redirect to get the current contents
# File lib/grell/page.rb, line 56 def followed_redirects? current_url != @url end
path()
click to toggle source
Extracts the path (e.g. /actions/test_action) from the URL
# File lib/grell/page.rb, line 66 def path URI.parse(@url).path rescue URI::InvalidURIError # Invalid URLs will be added and caught when we try to navigate to them @url end
retries()
click to toggle source
Number of times we have retried the current page
# File lib/grell/page.rb, line 46 def retries [@times_visited - 1, 0].max end