Elasticsearch::Persistence
¶ ↑
Persistence layer for Ruby domain objects in Elasticsearch
, using the Repository and ActiveRecord patterns.
The library is compatible with Ruby 1.9.3 (or higher) and Elasticsearch
1.0 (or higher).
Installation¶ ↑
Install the package from Rubygems:
gem install elasticsearch-persistence
To use an unreleased version, either add it to your Gemfile
for Bundler:
gem 'elasticsearch-persistence', git: 'git://github.com/elasticsearch/elasticsearch-rails.git'
or install it from a source code checkout:
git clone https://github.com/elasticsearch/elasticsearch-rails.git cd elasticsearch-rails/elasticsearch-persistence bundle install rake install
Usage¶ ↑
The Repository Pattern¶ ↑
The Elasticsearch::Persistence::Repository
module provides an implementation of the repository pattern and allows to save, delete, find and search objects stored in Elasticsearch
, as well as configure mappings and settings for the index.
Let’s have a simple plain old Ruby object (PORO):
class Note attr_reader :attributes def initialize(attributes={}) @attributes = attributes end def to_hash @attributes end end
Let’s create a default, “dumb” repository, as a first step:
require 'elasticsearch/persistence' repository = Elasticsearch::Persistence::Repository.new
We can save a Note
instance into the repository…
note = Note.new id: 1, text: 'Test' repository.save(note) # PUT http://localhost:9200/repository/note/1 [status:201, request:0.210s, query:n/a] # > {"id":1,"text":"Test"} # < {"_index":"repository","_type":"note","_id":"1","_version":1,"created":true}
…find it…
n = repository.find(1) # GET http://localhost:9200/repository/_all/1 [status:200, request:0.003s, query:n/a] # < {"_index":"repository","_type":"note","_id":"1","_version":2,"found":true, "_source" : {"id":1,"text":"Test"}} => <Note0x007fcbfc0c4980 @attributes={"id"=>1, "text"=>"Test"}>
…search for it…
repository.search(query: { match: { text: 'test' } }).first # GET http://localhost:9200/repository/_search [status:200, request:0.005s, query:0.002s] # > {"query":{"match":{"text":"test"}}} # < {"took":2, ... "hits":{"total":1, ... "hits":[{ ... "_source" : {"id":1,"text":"Test"}}]}} => <Note0x007fcbfc1c7b70 @attributes={"id"=>1, "text"=>"Test"}>
…or delete it:
repository.delete(note) # DELETE http://localhost:9200/repository/note/1 [status:200, request:0.014s, query:n/a] # < {"found":true,"_index":"repository","_type":"note","_id":"1","_version":3} => {"found"=>true, "_index"=>"repository", "_type"=>"note", "_id"=>"1", "_version"=>2}
The repository module provides a number of features and facilities to configure and customize the behaviour:
-
Configuring the
Elasticsearch
client being used -
Setting the index name, document type, and object class for deserialization
-
Composing mappings and settings for the index
-
Creating, deleting or refreshing the index
-
Finding or searching for documents
-
Providing access both to domain objects and hits for search results
-
Providing access to the
Elasticsearch
response for search results (aggregations, total, …) -
Defining the methods for serialization and deserialization
You can use the default repository class, or include the module in your own. Let’s review it in detail.
The Default Class¶ ↑
For simple cases, you can use the default, bundled repository class, and configure/customize it:
repository = Elasticsearch::Persistence::Repository.new do # Configure the Elasticsearch client client Elasticsearch::Client.new url: ENV['ELASTICSEARCH_URL'], log: true # Set a custom index name index :my_notes # Set a custom document type type :my_note # Specify the class to inicialize when deserializing documents klass Note # Configure the settings and mappings for the Elasticsearch index settings number_of_shards: 1 do mapping do indexes :text, analyzer: 'snowball' end end # Customize the serialization logic def serialize(document) super.merge(my_special_key: 'my_special_stuff') end # Customize the de-serialization logic def deserialize(document) puts "# ***** CUSTOM DESERIALIZE LOGIC KICKING IN... *****" super end end
The custom Elasticsearch
client will be used now, with a custom index and type names, as well as the custom serialization and de-serialization logic.
We can create the index with the desired settings and mappings:
repository.create_index! force: true # PUT http://localhost:9200/my_notes # > {"settings":{"number_of_shards":1},"mappings":{ ... {"text":{"analyzer":"snowball","type":"string"}}}}}
Save the document with extra properties added by the serialize
method:
repository.save(note) # PUT http://localhost:9200/my_notes/my_note/1 # > {"id":1,"text":"Test","my_special_key":"my_special_stuff"} {"_index"=>"my_notes", "_type"=>"my_note", "_id"=>"1", "_version"=>4, ... }
And deserialize
it:
repository.find(1) # ***** CUSTOM DESERIALIZE LOGIC KICKING IN... ***** <Note0x007f9bd782b7a0 @attributes={... "my_special_key"=>"my_special_stuff"}>
A Custom Class¶ ↑
In most cases, though, you’ll want to use a custom class for the repository, so let’s do that:
require 'base64' class NoteRepository include Elasticsearch::Persistence::Repository def initialize(options={}) index options[:index] || 'notes' client Elasticsearch::Client.new url: options[:url], log: options[:log] end klass Note settings number_of_shards: 1 do mapping do indexes :text, analyzer: 'snowball' # Do not index images indexes :image, index: 'no' end end # Base64 encode the "image" field in the document # def serialize(document) hash = document.to_hash.clone hash['image'] = Base64.encode64(hash['image']) if hash['image'] hash.to_hash end # Base64 decode the "image" field in the document # def deserialize(document) hash = document['_source'] hash['image'] = Base64.decode64(hash['image']) if hash['image'] klass.new hash end end
Include the Elasticsearch::Persistence::Repository
module to add the repository methods into the class.
You can customize the repository in the familiar way, by calling the DSL-like methods.
You can implement a custom initializer for your repository, add complex logic in its class and instance methods – in general, have all the freedom of a standard Ruby class.
repository = NoteRepository.new url: 'http://localhost:9200', log: true # Configure the repository instance repository.index = 'notes_development' repository.client.transport.logger.formatter = proc { |s, d, p, m| "\e[2m# #{m}\n\e[0m" } repository.create_index! force: true note = Note.new 'id' => 1, 'text' => 'Document with image', 'image' => '... BINARY DATA ...' repository.save(note) # PUT http://localhost:9200/notes_development/note/1 # > {"id":1,"text":"Document with image","image":"Li4uIEJJTkFSWSBEQVRBIC4uLg==\n"} puts repository.find(1).attributes['image'] # GET http://localhost:9200/notes_development/note/1 # < {... "_source" : { ... "image":"Li4uIEJJTkFSWSBEQVRBIC4uLg==\n"}} # => ... BINARY DATA ...
Methods Provided by the Repository¶ ↑
Client¶ ↑
The repository uses the standard Elasticsearch
client, which is accessible with the client
getter and setter methods:
repository.client = Elasticsearch::Client.new url: 'http://search.server.org' repository.client.transport.logger = Logger.new(STDERR)
Naming¶ ↑
The index
method specifies the Elasticsearch
index to use for storage, lookup and search (when not set, the value is inferred from the repository class name):
repository.index = 'notes_development'
The type
method specifies the Elasticsearch
document type to use for storage, lookup and search (when not set, the value is inferred from the document class name, or _all
is used):
repository.type = 'my_note'
The klass
method specifies the Ruby class name to use when initializing objects from documents retrieved from the repository (when not set, the value is inferred from the document _type
as fetched from Elasticsearch
):
repository.klass = MyNote
Index Configuration¶ ↑
The settings
and mappings
methods, provided by the {elasticsearch-model
} gem, allow to configure the index properties:
repository.settings number_of_shards: 1 repository.settings.to_hash # => {:number_of_shards=>1} repository.mappings { indexes :title, analyzer: 'snowball' } repository.mappings.to_hash # => { :note => {:properties=> ... }}
The convenience methods create_index!
, delete_index!
and refresh_index!
allow you to manage the index lifecycle.
Serialization¶ ↑
The serialize
and deserialize
methods allow you to customize the serialization of the document when passing it to the storage, and the initialization procedure when loading it from the storage:
class NoteRepository def serialize(document) Hash[document.to_hash.map() { |k,v| v.upcase! if k == :title; [k,v] }] end def deserialize(document) MyNote.new ActiveSupport::HashWithIndifferentAccess.new(document['_source']).deep_symbolize_keys end end
Storage¶ ↑
The save
method allows you to store a domain object in the repository:
note = Note.new id: 1, title: 'Quick Brown Fox' repository.save(note) # => {"_index"=>"notes_development", "_type"=>"my_note", "_id"=>"1", "_version"=>1, "created"=>true}
The update
method allows you to perform a partial update of a document in the repository. Use either a partial document:
repository.update id: 1, title: 'UPDATED', tags: [] # => {"_index"=>"notes_development", "_type"=>"note", "_id"=>"1", "_version"=>2}
Or a script (optionally with parameters):
repository.update 1, script: 'if (!ctx._source.tags.contains(t)) { ctx._source.tags += t }', params: { t: 'foo' } # => {"_index"=>"notes_development", "_type"=>"note", "_id"=>"1", "_version"=>3}
The delete
method allows to remove objects from the repository (pass either the object itself or its ID):
repository.delete(note) repository.delete(1)
Finding¶ ↑
The find
method allows to find one or many documents in the storage and returns them as deserialized Ruby objects:
repository.save Note.new(id: 2, title: 'Fast White Dog') note = repository.find(1) # => <MyNote ... QUICK BROWN FOX> notes = repository.find(1, 2) # => [<MyNote... QUICK BROWN FOX>, <MyNote ... FAST WHITE DOG>]
When the document with a specific ID isn’t found, a nil
is returned instead of the deserialized object:
notes = repository.find(1, 3, 2) # => [<MyNote ...>, nil, <MyNote ...>]
Handle the missing objects in the application code, or call compact
on the result.
Search¶ ↑
The search
method to retrieve objects from the repository by a query string or definition in the Elasticsearch
DSL:
repository.search('fox or dog').to_a # GET http://localhost:9200/notes_development/my_note/_search?q=fox # => [<MyNote ... FOX ...>, <MyNote ... DOG ...>] repository.search(query: { match: { title: 'fox dog' } }).to_a # GET http://localhost:9200/notes_development/my_note/_search # > {"query":{"match":{"title":"fox dog"}}} # => [<MyNote ... FOX ...>, <MyNote ... DOG ...>]
The returned object is an instance of the Elasticsearch::Persistence::Repository::Response::Results
class, which provides access to the results, the full returned response and hits.
results = repository.search(query: { match: { title: 'fox dog' } }) # Iterate over the objects # results.each do |note| puts "* #{note.attributes[:title]}" end # * QUICK BROWN FOX # * FAST WHITE DOG # Iterate over the objects and hits # results.each_with_hit do |note, hit| puts "* #{note.attributes[:title]}, score: #{hit._score}" end # * QUICK BROWN FOX, score: 0.29930896 # * FAST WHITE DOG, score: 0.29930896 # Get total results # results.total # => 2 # Access the raw response as a Hashie::Mash instance results.response._shards.failed # => 0
Example Application¶ ↑
An example Sinatra application is available in {examples/notes/application.rb
}, and demonstrates a rich set of features:
-
How to create and configure a custom repository class
-
How to work with a plain Ruby class as the domain object
-
How to integrate the repository with a Sinatra application
-
How to write complex search definitions, including pagination, highlighting and aggregations
-
How to use search results in the application view
The ActiveRecord Pattern¶ ↑
The Elasticsearch::Persistence::Model
module provides an implementation of the active record pattern, with a familiar interface for using Elasticsearch
as a persistence layer in Ruby on Rails applications.
All the methods are documented with comprehensive examples in the source code, available also online at rubydoc.info/gems/elasticsearch-persistence/Elasticsearch/Persistence/Model.
Installation/Usage¶ ↑
To use the library in a Rails application, add it to your Gemfile
with a require
statement:
gem "elasticsearch-persistence", require: 'elasticsearch/persistence/model'
To use the library without Bundler, install it, and require the file:
gem install elasticsearch-persistence
# In your code require 'elasticsearch/persistence/model'
Model Definition¶ ↑
The integration is implemented by including the module in a Ruby class. The model attribute definition support is implemented with the {Virtus} Rubygem, and the naming, validation, etc. features with the {ActiveModel} Rubygem.
class Article include Elasticsearch::Persistence::Model # Define a plain `title` attribute # attribute :title, String # Define an `author` attribute, with multiple analyzers for this field # attribute :author, String, mapping: { fields: { author: { type: 'string'}, raw: { type: 'string', analyzer: 'keyword' } } } # Define a `views` attribute, with default value # attribute :views, Integer, default: 0, mapping: { type: 'integer' } # Validate the presence of the `title` attribute # validates :title, presence: true # Execute code after saving the model. # after_save { puts "Successfuly saved: #{self}" } end
Attribute validations work like for any other ActiveModel-compatible implementation:
article = Article.new # => #<Article { ... }> article.valid? # => false article.errors.to_a # => ["Title can't be blank"]
Persistence¶ ↑
We can create a new article in the database…
Article.create id: 1, title: 'Test', author: 'John' # PUT http://localhost:9200/articles/article/1 [status:201, request:0.015s, query:n/a]
… and find it:
article = Article.find(1) # => #<Article { ... }> article._index # => "articles" article.id # => "1" article.title # => "Test"
To update the model, either update the attribute and save the model:
article.title = 'Updated' article.save # => {"_index"=>"articles", "_type"=>"article", "_id"=>"1", "_version"=>2, "created"=>false}
… or use the update_attributes
method:
article.update_attributes title: 'Test', author: 'Mary' # => {"_index"=>"articles", "_type"=>"article", "_id"=>"1", "_version"=>3}
The implementation supports the familiar interface for updating model timestamps:
article.touch # => => { ... "_version"=>4}
… and numeric attributes:
article.views # => 0 article.increment :views article.views # => 1
Any callbacks defined in the model will be triggered during the persistence operations:
article.save # Successfuly saved: #<Article {...}>
The model also supports familiar find_in_batches
and find_each
methods to efficiently retrieve big collections of model instances, using the Elasticsearch’s Scan API:
Article.find_each(_source_include: 'title') { |a| puts "===> #{a.title.upcase}" } # GET http://localhost:9200/articles/article/_search?scroll=5m&search_type=scan&size=20 # GET http://localhost:9200/_search/scroll?scroll=5m&scroll_id=c2Nhb... # ===> TEST # GET http://localhost:9200/_search/scroll?scroll=5m&scroll_id=c2Nhb... # => "c2Nhb..."
Search¶ ↑
The model class provides a search
method to retrieve model instances with a regular search definition, including highlighting, aggregations, etc:
results = Article.search query: { match: { title: 'test' } }, aggregations: { authors: { terms: { field: 'author.raw' } } }, highlight: { fields: { title: {} } } puts results.first.title # Test puts results.first.hit.highlight['title'] # <em>Test</em> puts results.response.aggregations.authors.buckets.each { |b| puts "#{b['key']} : #{b['doc_count']}" } # John : 1
Accessing the Repository Gateway¶ ↑
The integration with Elasticsearch
is implemented by embedding the repository object in the model. You can access it through the gateway
method:
Artist.gateway.client.info # GET http://localhost:9200/ [status:200, request:0.011s, query:n/a] # => {"status"=>200, "name"=>"Lightspeed", ...}
Rails Compatibility¶ ↑
The model instances are fully compatible with Rails’ conventions and helpers:
url_for article # => "http://localhost:3000/articles/1" div_for article # => '<div class="article" id="article_1"></div>'
… as well as form values for dates and times:
article = Article.new "title" => "Date", "published(1i)"=>"2014", "published(2i)"=>"1", "published(3i)"=>"1" article.published.iso8601 # => "2014-01-01"
The library provides a Rails ORM generator to facilitate building the application scaffolding:
rails generate scaffold Person name:String email:String birthday:Date --orm=elasticsearch
Example application¶ ↑
A fully working Ruby on Rails application can be generated with the following command:
rails new music --force --skip --skip-bundle --skip-active-record --template https://raw.githubusercontent.com/elasticsearch/elasticsearch-rails/master/elasticsearch-persistence/examples/music/template.rb
The application demonstrates:
-
How to set up model attributes with custom mappings
-
How to define model relationships with Elasticsearch’s parent/child
-
How to configure models to use a common index, and create the index with proper mappings
-
How to use Elasticsearch’s completion suggester to drive auto-complete functionality
-
How to use Elasticsearch-persisted models in Rails’ views and forms
-
How to write controller tests
The source files for the application are available in the {examples/music
} folder.
License¶ ↑
This software is licensed under the Apache 2 license, quoted below.
Copyright (c) 2014 Elasticsearch <http://www.elasticsearch.org> Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.