class TfIdfSimilarity::BM25Model

Public Instance Methods

idf(term)
inverse_document_frequency(term) click to toggle source

Return the term's inverse document frequency.

@param [String] term a term @return [Float] the term's inverse document frequency

# File lib/tf-idf-similarity/bm25_model.rb, line 11
def inverse_document_frequency(term)
  df = @model.document_count(term)
  log((documents.size - df + 0.5) / (df + 0.5))
end
Also aliased as: idf
term_frequency(document, term) click to toggle source

Returns the term's frequency in the document.

@param [Document] document a document @param [String] term a term @return [Float] the term's frequency in the document

@note Like Lucene, we use a b value of 0.75 and a k1 value of 1.2.

# File lib/tf-idf-similarity/bm25_model.rb, line 24
def term_frequency(document, term)
  tf = document.term_count(term)
  (tf * 2.2) / (tf + 0.3 + 0.9 * documents.size / @model.average_document_size)
end
Also aliased as: tf
tf(document, term)
Alias for: term_frequency