class RetrievalLite::Document

Representation of document using content as a string and term frequencies as a hash

Attributes

content[R]

the text of the document

id[R]

the id of the document

term_frequencies[R]

a Hash<String, Integer> of all terms of the documents to the frequency of each term

Public Class Methods

new(content, opts = {}) click to toggle source

Creates a new Retrieval Lite document. Upon initialization, the content is parsed into individual tokens, and its term frequencies are recorded.

@param content [String] the text of the document @param opts [Hash] optional arguments to initializer @option opts [String] :id the id of the document. Defaults to object_id assigned by ruby

# File lib/retrieval_lite/document.rb, line 16
def initialize(content, opts = {})
  @content = content
  @id = opts[:id] || object_id
  @term_frequencies = RetrievalLite::Tokenizer.parse_content(content)
end

Public Instance Methods

contains?(term) click to toggle source

@param term [String] @return [Boolean] whether a term appears in the document

# File lib/retrieval_lite/document.rb, line 44
def contains?(term)
  @term_frequencies.has_key?(term)
end
frequency_of(term) click to toggle source

@param term [String] @return [Integer] the number of times a term appears in the document

# File lib/retrieval_lite/document.rb, line 34
def frequency_of(term)
  if @term_frequencies.has_key?(term)
    return @term_frequencies[term]
  else
    return 0
  end
end
term_count() click to toggle source

@return [Integer] the total number of unique terms in the document

# File lib/retrieval_lite/document.rb, line 23
def term_count
  @term_frequencies.size
end
terms() click to toggle source

@return [Array<String>] the unique terms of the document

# File lib/retrieval_lite/document.rb, line 28
def terms
  @term_frequencies.keys
end
total_terms() click to toggle source

@return [Integer] the total number of terms (not unique) in the document

# File lib/retrieval_lite/document.rb, line 49
def total_terms
  count = 0
  @term_frequencies.each do |key, value|
    count += value
  end
  return count
end