class Krikri::SearchIndex

Search index base class that gets extended by QA and Production index classes

@todo rewrite to use generalized `EntityConsumer` interface & avoid

`#update_from_activity`, which is tighly bound to `Activity` rather than
`Enumerator<#entities>`.

Public Class Methods

new(opts) click to toggle source
# File lib/krikri/search_index.rb, line 14
def initialize(opts)
  @bulk_update_size = opts.delete(:bulk_update_size) { 1000 }
end

Public Instance Methods

add(_) click to toggle source

Add a single JSON document to the search index. Implemented in a child class.

@param _ [Hash] Hash that can be serialized to JSON with to_json

# File lib/krikri/search_index.rb, line 23
def add(_)
  fail NotImplementedError
end
bulk_add(_) click to toggle source

Add a number of JSON documents to the search index at once. Implemented in a child class.

@param _ [Array] Hashes that can be serialized to JSON with to_json

# File lib/krikri/search_index.rb, line 32
def bulk_add(_)
  fail NotImplementedError
end
update_from_activity(activity) click to toggle source

Shim that determines, for a particular type of index, which strategy to use, adding a single document, or adding them in bulk. Intended to be overridden as necessary.

@see add @see bulk_add

# File lib/krikri/search_index.rb, line 43
def update_from_activity(activity)
  incremental_update_from_activity(activity)
end

Protected Instance Methods

bulk_update_batches(aggregations) click to toggle source

Enumerate arrays of JSON strings, one array per batch that is supposed to be loaded into the search index.

@param aggregations [Enumerator] @return [Enumerator] Each array of JSON strings

# File lib/krikri/search_index.rb, line 70
def bulk_update_batches(aggregations)
  en = Enumerator.new do |e|
    i = 1
    batch = []
    aggregations.each do |agg|
      batch << agg
      if i % @bulk_update_size == 0
        e.yield batch
        batch = []
      end
      i += 1
    end
    e.yield batch if batch.count > 0  # last one
  end
  en.lazy
end
bulk_update_from_activity(activity) click to toggle source

Given an activity, use the bulk-update method to load its revised entities into the search index.

Any errors on bulk adds are caught and logged, and the batch is skipped.

@param activity [Krikri::Activity]

# File lib/krikri/search_index.rb, line 56
def bulk_update_from_activity(activity)
  all_aggs = entities_as_json_hashes(activity)
  agg_batches = bulk_update_batches(all_aggs)
  agg_batches.each do |batch|
    index_with_error_handling(activity) { bulk_add(batch) }
  end
end
entities_as_json_hashes(activity) click to toggle source

Given an activity, enumerate over revised entities, represented as hashes that can be serialized to JSON.

@param activity [Krikri::Activity] @return [Enumerator]

# File lib/krikri/search_index.rb, line 107
def entities_as_json_hashes(activity)
  activity.entities.lazy.map do |agg|
    hash_for_index_schema(agg)
  end
end
hash_for_index_schema(aggregation) click to toggle source

Return a JSON string from the given aggregation in a format suitable for the search index.

The default behavior is to turn out the MAPv4 JSON-LD straight from the aggregation.

This can be overridden to convert this to MAPv3 JSON-LD or whatever.

@param aggregation [DPLA::MAP::Aggregation] The aggregation @return [Hash] Hash that can respond to to_json for serialization

# File lib/krikri/search_index.rb, line 124
def hash_for_index_schema(aggregation)
  aggregation.to_jsonld['@graph'][0]
end
incremental_update_from_activity(activity) click to toggle source

Given an activity, load its revised entities into the search index one at a time.

Any errors on individual record adds are caught and logged, and the record is skipped.

@param activity [Krikri::Activity]

# File lib/krikri/search_index.rb, line 95
def incremental_update_from_activity(activity)
  entities_as_json_hashes(activity).each do |h|
    index_with_error_handling(activity) { add(h) }
  end
end

Private Instance Methods

index_with_error_handling(activity) { || ... } click to toggle source

Runs a block, catching any errors and logging them with the given activity id.

# File lib/krikri/search_index.rb, line 133
def index_with_error_handling(activity, &block)
  begin
    yield if block_given?
  rescue => e
    Krikri::Logger
      .log(:error, "indexer error for Activity #{activity}:\n#{e.message}")
  end
end