class Grell::CrawlerManager

Manages the state of the process crawling, does not care about individual pages but about logging, restarting and quiting the crawler correctly.

Constants

KILL_TIMEOUT
PAGES_TO_RESTART

Public Class Methods

cleanup_all_processes() click to toggle source
# File lib/grell/crawler_manager.rb, line 41
def self.cleanup_all_processes
  PhantomJSManager.new.cleanup_all_processes
end
new(logger: nil, on_periodic_restart: {}, driver: nil) click to toggle source

logger: logger to use for Grell's messages on_periodic_restart: if set, the driver will restart every :each visits (100 default) and execute the :do block driver_options: Any extra options for the Capybara driver

# File lib/grell/crawler_manager.rb, line 8
def initialize(logger: nil, on_periodic_restart: {}, driver: nil)
  Grell.logger = logger ? logger : Logger.new(STDOUT)
  @periodic_restart_block = on_periodic_restart[:do]
  @periodic_restart_period = on_periodic_restart[:each] || PAGES_TO_RESTART
  @driver = driver || CapybaraDriver.new.setup_capybara
  if @periodic_restart_period <= 0
    Grell.logger.warn "GRELL. Restart option misconfigured with a negative period. Ignoring option."
  end
end

Public Instance Methods

check_periodic_restart(collection) click to toggle source

PhantomJS seems to consume memory increasingly as it crawls, periodic restart allows to restart the driver, potentially calling a block.

# File lib/grell/crawler_manager.rb, line 33
def check_periodic_restart(collection)
  return unless @periodic_restart_block
  return unless @periodic_restart_period > 0
  return unless (collection.visited_pages.size % @periodic_restart_period).zero?
  restart
  @periodic_restart_block.call
end
quit() click to toggle source

Quits the poltergeist driver.

# File lib/grell/crawler_manager.rb, line 26
def quit
  Grell.logger.info "GRELL. Driver quitting"
  @driver.quit
end
restart() click to toggle source

Restarts the PhantomJS process without modifying the state of visited and discovered pages.

# File lib/grell/crawler_manager.rb, line 19
def restart
  Grell.logger.info "GRELL. Driver restarting"
  @driver.restart
  Grell.logger.info "GRELL. Driver restarted"
end