class Krikri::Harvesters::ApiHarvester
A harvester implementation for REST APIs. The default ApiHarvester
expects Solr-like JSON responses/records.
An internal interface is provided for easier subclassing. A new API harvester may reimplement:
- #get_docs (to retrieve record docs from a response) - #get_count (to determine total record count from a response) - #get_identifier (to retrieve an indentifier from a record document) - #get_content (to retrieve a content string from a record document) - #next_options` (to generate the parameters for the next request)
If the content type of the records is other than JSON, you will also want to override `#content_type`.
Attributes
Public Class Methods
@return [Hash] A hash documenting the allowable options to pass to
initializers.
@see Krikri::Harvester::expected_opts
# File lib/krikri/harvesters/api_harvester.rb, line 34 def self.expected_opts { key: :api, opts: { params: { type: :string, required: false } } } end
@param opts [Hash] options for the harvester @see .expected_opts
Krikri::Harvester::new
# File lib/krikri/harvesters/api_harvester.rb, line 24 def initialize(opts = {}) super @opts = opts.fetch(:api, {}) end
Public Instance Methods
@return [String] the content type for the records generated by this
harvester
# File lib/krikri/harvesters/api_harvester.rb, line 76 def content_type 'application/json' end
# File lib/krikri/harvesters/api_harvester.rb, line 45 def count get_count(request(opts)) end
@param identifier [#to_s] the identifier of the record to get @return [#to_s] the record
# File lib/krikri/harvesters/api_harvester.rb, line 68 def get_record(identifier) response = request(:params => { :q => "id:#{identifier.to_s}" }) build_record(get_docs(response).first) end
Gets a single record with the given identifier from the API
@return [Enumerator::Lazy] an enumerator over the ids for the records
targeted by this harvester.
# File lib/krikri/harvesters/api_harvester.rb, line 61 def record_ids enumerate_records.lazy.map { |r| get_identifier(r) } end
@return [Enumerator::Lazy] an enumerator of the records targeted by this
harvester.
# File lib/krikri/harvesters/api_harvester.rb, line 52 def records enumerate_records.lazy.map { |rec| build_record(rec) } end
Private Instance Methods
Builds an instance of `@record_class` with the given doc's JSON as content.
@param doc [#to_json] the content to serialize as JSON in `#content` @return [#to_s] an instance of @record_class with a minted id and
content the given content
# File lib/krikri/harvesters/api_harvester.rb, line 158 def build_record(doc) @record_class.build(mint_id(get_identifier(doc)), get_content(doc), content_type) end
@return [Enumerator] an enumerator over the records
# File lib/krikri/harvesters/api_harvester.rb, line 136 def enumerate_records Enumerator.new do |yielder| request_opts = opts.deep_dup loop do break if request_opts.nil? docs = get_docs(request(request_opts.dup)) break if docs.empty? docs.each { |r| yielder << r } request_opts = next_options(request_opts, docs.count) end end end
@param doc [#to_s] a raw record document
@return [String] the record content
# File lib/krikri/harvesters/api_harvester.rb, line 110 def get_content(doc) doc.to_json end
@param response [#to_s] a response from the REST API
@return [Integer] a count of the total records found by the request
# File lib/krikri/harvesters/api_harvester.rb, line 94 def get_count(response) response['response']['numFound'] end
@param response [#to_s] a response from the REST API
@return [Array] an array of record documents from the response
# File lib/krikri/harvesters/api_harvester.rb, line 102 def get_docs(response) response['response']['docs'] end
@param doc [#to_s] a raw record document with an identifier
@return [String] the provider's identifier for the document
# File lib/krikri/harvesters/api_harvester.rb, line 86 def get_identifier(doc) doc['record_id'] end
Given a current set of options and a number of records from the last request, generate the options for the next request.
@param opts [Hash] an options hash from the previous request @param record_count [#to_i]
@return [Hash] the next request's options hash
# File lib/krikri/harvesters/api_harvester.rb, line 128 def next_options(opts, record_count) old_start = opts['params'].fetch('start', 0) opts['params']['start'] = old_start.to_i + record_count opts end
Send a request via `RestClient`, and parse the result as JSON
# File lib/krikri/harvesters/api_harvester.rb, line 116 def request(request_opts) JSON.parse(RestClient.get(uri, request_opts)) end