module TfIdfSimilarity

A document-term matrix using the BM25 function.

@see lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/search/similarities/BM25Similarity.html @see en.wikipedia.org/wiki/Okapi_BM25

A document.

@see nlp.stanford.edu/IR-book/html/htmledition/document-and-query-weighting-schemes-1.html @see www.cs.odu.edu/~jbollen/IR04/readings/article1-29-03.pdf @see www.sandia.gov/~tgkolda/pubs/bibtgkfiles/ornl-tm-13756.pdf

A simple document-term matrix.

A document-term matrix using the tf*idf function.

@see lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

A token.

@note We can add more filters from Solr and stem using Porter's Snowball.

@see github.com/aurelian/ruby-stemmer @see wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StopFilterFactory @see wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory @see wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

A tokenizer using UnicodeUtils to tokenize a text.

@see github.com/lang/unicode_utils

Constants

VERSION