module TfIdfSimilarity
A document-term matrix using the BM25 function.
@see lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/search/similarities/BM25Similarity.html @see en.wikipedia.org/wiki/Okapi_BM25
A document.
@see nlp.stanford.edu/IR-book/html/htmledition/document-and-query-weighting-schemes-1.html @see www.cs.odu.edu/~jbollen/IR04/readings/article1-29-03.pdf @see www.sandia.gov/~tgkolda/pubs/bibtgkfiles/ornl-tm-13756.pdf
A simple document-term matrix.
A document-term matrix using the tf*idf function.
@see lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
A token.
@note We can add more filters from Solr and stem using Porter's Snowball.
@see github.com/aurelian/ruby-stemmer @see wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StopFilterFactory @see wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory @see wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
A tokenizer using UnicodeUtils to tokenize a text.
Constants
- VERSION