class NameFinder

Find names from a know list in a text, taking account of names that may overlap. For example, Waterloo and Waterloo East are separate stations; NameFinder, knowing both, will not give a false match for Waterloo in a text that mentions Waterloo East.

Constants

VERSION

Attributes

root[R]

Public Class Methods

new(tree={}) click to toggle source

Initialize a new NameFinder. tree, if supplied, should be the data generated by the export method.

# File lib/name_finder.rb, line 16
def initialize(tree={})
  @tree = tree
  @root = NodeProxy.new(tree, delimiter)
end

Public Instance Methods

add(term) click to toggle source

Add a term to NameFinder’s dictionary

# File lib/name_finder.rb, line 25
def add(term)
  root.add Buffer.new(normalize(term) + delimiter), term
end
export() click to toggle source

Export the tree of the current dictionary for later re-importing.

# File lib/name_finder.rb, line 50
def export
  @tree
end
find_all_in(haystack) click to toggle source

Find all names from the dictionary in haystack.

# File lib/name_finder.rb, line 40
def find_all_in(haystack)
  Set.new.tap { |all|
    find(haystack) do |found|
      all << found
    end
  }.to_a
end
find_in(haystack) click to toggle source

Find the first name from the dictionary in haystack

# File lib/name_finder.rb, line 31
def find_in(haystack)
  find(haystack) do |found|
    return found
  end
  nil
end

Private Instance Methods

delimiter() click to toggle source
# File lib/name_finder.rb, line 72
def delimiter
  " "
end
find(haystack) { |found| ... } click to toggle source
# File lib/name_finder.rb, line 55
def find(haystack)
  remaining = Buffer.new(normalize(haystack) + delimiter)
  while !remaining.at_end?
    found = root.find(remaining)
    if found
      yield found
      remaining = remaining.advance_by(found.length)
    else
      remaining = remaining.advance_past(delimiter)
    end
  end
end
normalize(term) click to toggle source
# File lib/name_finder.rb, line 68
def normalize(term)
  term.downcase.gsub(/[^a-z]+/, delimiter)
end