class Grell::CrawlerManager
Manages the state of the process crawling, does not care about individual pages but about logging, restarting and quiting the crawler correctly.
Constants
- KILL_TIMEOUT
- PAGES_TO_RESTART
Public Class Methods
cleanup_all_processes()
click to toggle source
# File lib/grell/crawler_manager.rb, line 41 def self.cleanup_all_processes PhantomJSManager.new.cleanup_all_processes end
new(logger: nil, on_periodic_restart: {}, driver: nil)
click to toggle source
logger: logger to use for Grell's messages on_periodic_restart: if set, the driver will restart every :each visits (100 default) and execute the :do block driver_options: Any extra options for the Capybara driver
# File lib/grell/crawler_manager.rb, line 8 def initialize(logger: nil, on_periodic_restart: {}, driver: nil) Grell.logger = logger ? logger : Logger.new(STDOUT) @periodic_restart_block = on_periodic_restart[:do] @periodic_restart_period = on_periodic_restart[:each] || PAGES_TO_RESTART @driver = driver || CapybaraDriver.new.setup_capybara if @periodic_restart_period <= 0 Grell.logger.warn "GRELL. Restart option misconfigured with a negative period. Ignoring option." end end
Public Instance Methods
check_periodic_restart(collection)
click to toggle source
PhantomJS seems to consume memory increasingly as it crawls, periodic restart allows to restart the driver, potentially calling a block.
# File lib/grell/crawler_manager.rb, line 33 def check_periodic_restart(collection) return unless @periodic_restart_block return unless @periodic_restart_period > 0 return unless (collection.visited_pages.size % @periodic_restart_period).zero? restart @periodic_restart_block.call end
quit()
click to toggle source
Quits the poltergeist driver.
# File lib/grell/crawler_manager.rb, line 26 def quit Grell.logger.info "GRELL. Driver quitting" @driver.quit end
restart()
click to toggle source
Restarts the PhantomJS process without modifying the state of visited and discovered pages.
# File lib/grell/crawler_manager.rb, line 19 def restart Grell.logger.info "GRELL. Driver restarting" @driver.restart Grell.logger.info "GRELL. Driver restarted" end