class RetrievalLite::Document
Representation of document using content as a string and term frequencies as a hash
Attributes
the text of the document
the id of the document
a Hash<String, Integer> of all terms of the documents to the frequency of each term
Public Class Methods
Creates a new Retrieval Lite document. Upon initialization, the content is parsed into individual tokens, and its term frequencies are recorded.
@param content [String] the text of the document @param opts [Hash] optional arguments to initializer @option opts [String] :id the id of the document. Defaults to object_id assigned by ruby
# File lib/retrieval_lite/document.rb, line 16 def initialize(content, opts = {}) @content = content @id = opts[:id] || object_id @term_frequencies = RetrievalLite::Tokenizer.parse_content(content) end
Public Instance Methods
@param term [String] @return [Boolean] whether a term appears in the document
# File lib/retrieval_lite/document.rb, line 44 def contains?(term) @term_frequencies.has_key?(term) end
@param term [String] @return [Integer] the number of times a term appears in the document
# File lib/retrieval_lite/document.rb, line 34 def frequency_of(term) if @term_frequencies.has_key?(term) return @term_frequencies[term] else return 0 end end
@return [Integer] the total number of unique terms in the document
# File lib/retrieval_lite/document.rb, line 23 def term_count @term_frequencies.size end
@return [Array<String>] the unique terms of the document
# File lib/retrieval_lite/document.rb, line 28 def terms @term_frequencies.keys end
@return [Integer] the total number of terms (not unique) in the document
# File lib/retrieval_lite/document.rb, line 49 def total_terms count = 0 @term_frequencies.each do |key, value| count += value end return count end