class TfIdfSimilarity::BM25Model
Public Instance Methods
inverse_document_frequency(term)
click to toggle source
Return the term's inverse document frequency.
@param [String] term a term @return [Float] the term's inverse document frequency
# File lib/tf-idf-similarity/bm25_model.rb, line 11 def inverse_document_frequency(term) df = @model.document_count(term) log((documents.size - df + 0.5) / (df + 0.5)) end
Also aliased as: idf
term_frequency(document, term)
click to toggle source
Returns the term's frequency in the document.
@param [Document] document a document @param [String] term a term @return [Float] the term's frequency in the document
@note Like Lucene, we use a b value of 0.75 and a k1 value of 1.2.
# File lib/tf-idf-similarity/bm25_model.rb, line 24 def term_frequency(document, term) tf = document.term_count(term) (tf * 2.2) / (tf + 0.3 + 0.9 * documents.size / @model.average_document_size) end
Also aliased as: tf