module DataCollector::Core

Public Instance Methods

config() click to toggle source
# File lib/data_collector/core.rb, line 108
def config
  @config ||= ConfigFile
end
filter(data, filter_path) click to toggle source

evaluator jsonpath.com/ uitleg goessner.net/articles/JsonPath/index.html

# File lib/data_collector/core.rb, line 90
def filter(data, filter_path)
  filtered = []
  if filter_path.is_a?(Array) && data.is_a?(Array)
    filtered = data.map {|m| m.select {|k, v| filter_path.include?(k.to_sym)}}
  elsif filter_path.is_a?(String)
    filtered = JsonPath.on(data, filter_path)
  end

  filtered = [filtered] unless filtered.is_a?(Array)
  filtered = filtered.first if filtered.length == 1 && filtered.first.is_a?(Array)

  filtered
rescue StandardError => e
  @logger ||= Logger.new(STDOUT)
  @logger.error("#{filter_path} failed: #{e.message}")
  []
end
input() click to toggle source

Read input from an URI example: input.from_uri(“www.libis.be”)

input.from_uri("file://hello.txt")
# File lib/data_collector/core.rb, line 16
def input
  @input ||= DataCollector::Input.new
end
log(message) click to toggle source
# File lib/data_collector/core.rb, line 112
def log(message)
  @logger ||= Logger.new(STDOUT)
  @logger.info(message)
end
output() click to toggle source

Output is an object you can store data that needs to be written to an output stream output = 'John' output = 'Doe'

Write output to a file, string use an ERB file as a template example: test.erb

<names>
  <combined><%= data[:name] %> <%= data[:last_name] %></combined>
  <%= print data, :name, :first_name %>
  <%= print data, :last_name %>
</names>

will produce

<names>
  <combined>John Doe</combined>
  <first_name>John</first_name>
  <last_name>Doe</last_name>
</names>

Into a variable result = output.to_s(“test.erb”) Into a file stored in records dir output.to_file(“test.erb”) Into a tar file stored in data output.to_file(“test.erb”, “my_data.tar.gz”) Into a temp directory output.to_tmp_file(“test.erb”,“directory”)

# File lib/data_collector/core.rb, line 48
def output
  @output ||= Output.new
end
rules() click to toggle source

You can apply rules to input

A rule is made up of a Hash the key is the map key field its value is a Hash with a JSONPath filter and
options to apply a convert method on the filtered results.

available convert methods are: time, map, each, call, suffix
 - time: Parses a given time/date string into a Time object
 - map: applies a mapping to a filter
 - suffix: adds a suffix to a result
 - call: executes a lambda on the filter
 - each: runs a lambda on each row of a filter

example:
my_rules = {
  'identifier' => {"filter" => '$..id'},
  'language' => {'filter' => '$..lang',
                 'options' => {'convert' => 'map',
                               'map' => {'nl' => 'dut', 'fr' => 'fre', 'de' => 'ger', 'en' => 'eng'}
                              }
                },
  'subject' => {'filter' => '$..keywords',
                options' => {'convert' => 'each',
                             'lambda' => lambda {|d| d.split(',')}
                            }
               },
  'creationdate' => {'filter' => '$..published_date', 'convert' => 'time'}
}
rules.run(my_rules, input, output)
# File lib/data_collector/core.rb, line 79
def rules
  @rules ||= Rules.new
end
rules_ng() click to toggle source

New rules runner

# File lib/data_collector/core.rb, line 84
def rules_ng
  @rules_ng ||= RulesNg.new
end