class Wayfarer::Job
A {Job} is a class that has a {Routing::Router} with many {Routing::Rule}s which are matched against a URI. Rules map URIs onto job instance methods. Under the hood, jobs are instantiated within separate threads by a {Processor}. Every instance gets its own thread. If a URI is matched, its {Page} is retrieved, and made available to instance methods via {#page}.
Jobs implement ActiveJob's Job
API and are therefore compatible with a wide range of job queues. To run a job immediately, call ::perform_now. enqueue a job, call ::perform_later.
@see github.com/rails/rails/tree/master/activejob rails/activejob @see edgeguides.rubyonrails.org/active_job_basics.html ActiveJob Basics
Attributes
@!attribute [w] config
@!attribute [w] router
@!attribute [rw] adapter
@!attribute [rw] page
@!attribute [rw] params
@!attribute [r] staged_uris
@return [Array<String>, Array<URI>] URIs to stage for the next cycle. @see stage
Public Class Methods
A configuration based off the global {Wayfarer.config}. @yield [Configuration] @return [Configuration]
# File lib/wayfarer/job.rb, line 83 def config @config ||= Wayfarer.config.clone yield(@config) if block_given? @config end
# File lib/wayfarer/job.rb, line 119 def initialize(*argv) @halts = false @staged_uris = [] super(*argv) end
Returns a class copy.
# File lib/wayfarer/job.rb, line 60 def prepare duplicate = dup duplicate.router = router.dup duplicate.locals = locals.deep_dup duplicate.config = config.dup duplicate.locals.each do |(key, val)| duplicate.locals[key] = Locals.thread_safe_counterpart(val) end duplicate.locals.each do |(key, _)| duplicate.send(:define_method, key) do duplicate.locals[key] end duplicate.send(:define_singleton_method, key) do duplicate.locals[key] end end duplicate end
A router. If a block is passed in, it is evaluated within the {Router}'s instance. @return [Routing::Router]
# File lib/wayfarer/job.rb, line 92 def router(&proc) @router ||= Routing::Router.new @router.instance_eval(&proc) if block_given? @router end
Public Instance Methods
Whether this job will stop processing.
# File lib/wayfarer/job.rb, line 126 def halts? @halts end
Performs this job. @note ActiveJob API @override
# File lib/wayfarer/job.rb, line 133 def perform(*uris) Crawl.new(self.class, *uris).execute end
Protected Instance Methods
Sets a halting flag that signals the processor to stop its work.
# File lib/wayfarer/job.rb, line 142 def halt @halts = true end
The {Page} representing the URI currently processed by an action. When using the Selenium adapter, {Page#body} gets refreshed on every call. Otherwise, subsequent DOM updates (i.e. JavaScript-induced) would be invisible. @return Page
# File lib/wayfarer/job.rb, line 178 def page return @page unless self.class.config.http_adapter == :selenium Page.new( uri: @page.uri, status_code: @page.uri, body: driver.page_source, headers: @page.headers ) end
Adds URIs to process in the next cycle. If a relative path is given, an absolute URI is constructed from the current {#page}'s URI. @param [String, URI, Array<String>, Array<URI>]
# File lib/wayfarer/job.rb, line 150 def stage(*uris) expanded = uris.flatten.map do |u| if (uri = URI(u)).absolute? uri else # URI#join would discard the path of page.uri.path current = page.uri.dup current.path = File.join(page.uri.path, uri.path) current end end # This method has somewhat become the guard keeper for invalid URIs that # would lead to exceptions otherwise down the line supported = expanded.select do |uri| HTTPAdapters::NetHTTPAdapter::RECOGNIZED_URI_TYPES.any? do |type| uri.is_a?(type) end end @staged_uris.push(*supported) end