class Grell::PageCollection
Keeps a record of all the pages crawled. When a new url is found it is added to this collection, which makes sure it is unique. This page is part of the discovered pages. Eventually that page will be navigated to, then the page will be part of the visited pages.
Attributes
collection[R]
Public Class Methods
new(add_match_block)
click to toggle source
A block containing the logic that determines if a new URL should be added to the collection or if it is already present will be passed to the initializer.
# File lib/grell/page_collection.rb, line 11 def initialize(add_match_block) @collection = [] @add_match_block = add_match_block || default_add_match end
Public Instance Methods
create_page(url, parent_id)
click to toggle source
# File lib/grell/page_collection.rb, line 16 def create_page(url, parent_id) page_id = next_id page = Page.new(url, page_id, parent_id) add(page) page end
discovered_pages()
click to toggle source
# File lib/grell/page_collection.rb, line 27 def discovered_pages @collection - visited_pages end
next_page()
click to toggle source
# File lib/grell/page_collection.rb, line 31 def next_page discovered_pages.sort_by{|page| page.parent_id}.first end
visited_pages()
click to toggle source
# File lib/grell/page_collection.rb, line 23 def visited_pages @collection.select {|page| page.visited?} end
Private Instance Methods
add(page)
click to toggle source
# File lib/grell/page_collection.rb, line 41 def add(page) # Although finding unique pages based on URL will add pages with different query parameters, # in some cases we do link to different pages depending on the query parameters like when using proxies new_url = @collection.none? do |collection_page| @add_match_block.call(collection_page, page) end if new_url @collection.push page end end
default_add_match()
click to toggle source
If add_match_block is not provided, url matching to determine if a new page should be added to the page collection will default to this proc
# File lib/grell/page_collection.rb, line 55 def default_add_match Proc.new do |collection_page, page| collection_page.url.downcase == page.url.downcase end end
next_id()
click to toggle source
# File lib/grell/page_collection.rb, line 37 def next_id @collection.size end