class EagleClaw::Scraper

Attributes

properties[RW]
data[RW]

A ‘Hash` which holds data collected during a run.

@see initialize @see reset

problems[RW]

An ‘Array` which collects

Public Class Methods

after(context, meth = nil, &block) click to toggle source

Define a post-processor to run in a certain context.

@param [Symbol] context either ‘:each` or `:all`. @param [optional, Symbol] meth name of method to call. @return [nil]

@overload after(:each, :method_name)

Run the given method after each component of the run.

@overload after(:all, :method_name)

Run the given method after the run itself.

@overload after(:each, &block)

Run the given block (using `instance_eval`) after each component of
the run.

@overload after(:all, &block)

Run the given block (using `instance_eval`) after the entire run.

@see before

# File lib/eagleclaw.rb, line 65
def after(context, meth = nil, &block)
  register([:after, context], meth, &block)
end
before(context, meth = nil, &block) click to toggle source

Define a pre-processor to run in a certain context.

@param [Symbol] context either ‘:each` or `:all`. @param [optional, Symbol] meth name of method to call. @return [nil]

@overload before(:each, :method_name)

Run the given method before each component of the run.

@overload before(:all, :method_name)

Run the given method before the run itself.

@overload before(:each, &block)

Run the given block (using `instance_eval`) before each component of
the run.

@overload before(:all, &block)

Run the given block (using `instance_eval`) before the run itself.

@example Fetch a page before the run

before(:all) do
  agent.get("http://google.com/")
end

@example Reset the page before each component of the run

before(:each) do
  agent.get("http://google.com/")
end
# File lib/eagleclaw.rb, line 42
def before(context, meth = nil, &block)
  register([:before, context], meth, &block)
end
new() click to toggle source

Create a new {Scraper} instance.

By default, just sets {#data @data} and {#problems @problems} to empty ‘Array`s.

# File lib/eagleclaw.rb, line 94
def initialize
  @data = []
  @problems = []
end
prop(prop_name, meth = nil, &block) click to toggle source
# File lib/eagleclaw.rb, line 69
def prop(prop_name, meth = nil, &block)
  (@properties ||= []) << prop_name.to_sym
  register([:property, prop_name.to_sym], meth, &block)
end

Public Instance Methods

reset() click to toggle source

Reset this scraper instance’s state.

The default version of this method just clears {#data @data} and {#problems @problems}.

@return [nil] @abstract Subclass and extend to reset the scraper state.

# File lib/eagleclaw.rb, line 108
def reset
  data.clear
  problems.clear
end
run() click to toggle source

Run the scraper.

Operating procedure:

  1. Run {Scraper.before before(:all)} blocks.

  2. For each property (defined with {Scraper.prop prop(:prop_name)}):

    1. Run {Scraper.before before(:each)} blocks.

    2. Run the property itself.

    3. Runs {Scraper.after after(:each)} blocks.

  3. Runs {Scraper.after after(:all)} blocks.

  4. Return {#data data}.

@see reset @see data

# File lib/eagleclaw.rb, line 128
def run
  self.class.run_callbacks([:before, :all], self)
  self.class.properties.each do |property|
    self.class.run_callbacks([:before, :each], self)
    self.class.run_callbacks([:property, property], self)
    self.class.run_callbacks([:after, :each], self)
  end
  self.class.run_callbacks([:after, :all], self)
  data
end