class Grell::PageCollection

Keeps a record of all the pages crawled. When a new url is found it is added to this collection, which makes sure it is unique. This page is part of the discovered pages. Eventually that page will be navigated to, then the page will be part of the visited pages.

Attributes

collection[R]

Public Class Methods

new(add_match_block) click to toggle source

A block containing the logic that determines if a new URL should be added to the collection or if it is already present will be passed to the initializer.

# File lib/grell/page_collection.rb, line 11
def initialize(add_match_block)
  @collection = []
  @add_match_block = add_match_block || default_add_match
end

Public Instance Methods

create_page(url, parent_id) click to toggle source
# File lib/grell/page_collection.rb, line 16
def create_page(url, parent_id)
  page_id = next_id
  page = Page.new(url, page_id, parent_id)
  add(page)
  page
end
discovered_pages() click to toggle source
# File lib/grell/page_collection.rb, line 27
def discovered_pages
  @collection - visited_pages
end
next_page() click to toggle source
# File lib/grell/page_collection.rb, line 31
def next_page
  discovered_pages.sort_by{|page| page.parent_id}.first
end
visited_pages() click to toggle source
# File lib/grell/page_collection.rb, line 23
def visited_pages
  @collection.select {|page| page.visited?}
end

Private Instance Methods

add(page) click to toggle source
# File lib/grell/page_collection.rb, line 41
def add(page)
  # Although finding unique pages based on URL will add pages with different query parameters,
  # in some cases we do link to different pages depending on the query parameters like when using proxies
  new_url = @collection.none? do |collection_page|
    @add_match_block.call(collection_page, page)
  end

  if new_url
    @collection.push page
  end
end
default_add_match() click to toggle source

If add_match_block is not provided, url matching to determine if a new page should be added to the page collection will default to this proc

# File lib/grell/page_collection.rb, line 55
def default_add_match
  Proc.new do |collection_page, page|
    collection_page.url.downcase == page.url.downcase
  end
end
next_id() click to toggle source
# File lib/grell/page_collection.rb, line 37
def next_id
  @collection.size
end