class Krikri::SearchIndex
Search index base class that gets extended by QA and Production index classes
@todo rewrite to use generalized `EntityConsumer` interface & avoid
`#update_from_activity`, which is tighly bound to `Activity` rather than `Enumerator<#entities>`.
Public Class Methods
# File lib/krikri/search_index.rb, line 14 def initialize(opts) @bulk_update_size = opts.delete(:bulk_update_size) { 1000 } end
Public Instance Methods
Add a single JSON document to the search index. Implemented in a child class.
@param _ [Hash] Hash that can be serialized to JSON with to_json
# File lib/krikri/search_index.rb, line 23 def add(_) fail NotImplementedError end
Add a number of JSON documents to the search index at once. Implemented in a child class.
@param _ [Array] Hashes that can be serialized to JSON with to_json
# File lib/krikri/search_index.rb, line 32 def bulk_add(_) fail NotImplementedError end
Shim that determines, for a particular type of index, which strategy to use, adding a single document, or adding them in bulk. Intended to be overridden as necessary.
# File lib/krikri/search_index.rb, line 43 def update_from_activity(activity) incremental_update_from_activity(activity) end
Protected Instance Methods
Enumerate arrays of JSON strings, one array per batch that is supposed to be loaded into the search index.
@param aggregations [Enumerator] @return [Enumerator] Each array of JSON strings
# File lib/krikri/search_index.rb, line 70 def bulk_update_batches(aggregations) en = Enumerator.new do |e| i = 1 batch = [] aggregations.each do |agg| batch << agg if i % @bulk_update_size == 0 e.yield batch batch = [] end i += 1 end e.yield batch if batch.count > 0 # last one end en.lazy end
Given an activity, use the bulk-update method to load its revised entities into the search index.
Any errors on bulk adds are caught and logged, and the batch is skipped.
@param activity [Krikri::Activity]
# File lib/krikri/search_index.rb, line 56 def bulk_update_from_activity(activity) all_aggs = entities_as_json_hashes(activity) agg_batches = bulk_update_batches(all_aggs) agg_batches.each do |batch| index_with_error_handling(activity) { bulk_add(batch) } end end
Given an activity, enumerate over revised entities, represented as hashes that can be serialized to JSON.
@param activity [Krikri::Activity] @return [Enumerator]
# File lib/krikri/search_index.rb, line 107 def entities_as_json_hashes(activity) activity.entities.lazy.map do |agg| hash_for_index_schema(agg) end end
Return a JSON string from the given aggregation in a format suitable for the search index.
The default behavior is to turn out the MAPv4 JSON-LD straight from the aggregation.
This can be overridden to convert this to MAPv3 JSON-LD or whatever.
@param aggregation [DPLA::MAP::Aggregation] The aggregation @return [Hash] Hash that can respond to to_json for serialization
# File lib/krikri/search_index.rb, line 124 def hash_for_index_schema(aggregation) aggregation.to_jsonld['@graph'][0] end
Given an activity, load its revised entities into the search index one at a time.
Any errors on individual record adds are caught and logged, and the record is skipped.
@param activity [Krikri::Activity]
# File lib/krikri/search_index.rb, line 95 def incremental_update_from_activity(activity) entities_as_json_hashes(activity).each do |h| index_with_error_handling(activity) { add(h) } end end
Private Instance Methods
Runs a block, catching any errors and logging them with the given activity id.
# File lib/krikri/search_index.rb, line 133 def index_with_error_handling(activity, &block) begin yield if block_given? rescue => e Krikri::Logger .log(:error, "indexer error for Activity #{activity}:\n#{e.message}") end end