class MaybeYouMeant::Levenshtein

Public Class Methods

distance(s, t, max = nil) click to toggle source

Computes the Levenshtein distance between two strings.

This is currently completely unoptimized, both in terms of space and time. Two planned optimizations are to only use two rows instead of a matrix, and as ultimately we only care about a threshold distance we only need to calculate a diagonal instead of the full distance. This could be further optimized to be evaluated lazily.

If max is provided the return value is the Levenshtein distance if it is less than max, otherwise it is any value greater than or equal to max.

# File lib/maybeyoumeant/levenshtein.rb, line 14
def self.distance(s, t, max = nil)
  m = s.length
  n = t.length

  # If the string lengths differ by more than max return immediately.
  return max if max && (m - n).abs >= max

  cur = Array.new(n + 1, 0)
  nxt = Array.new(n + 1, 0)

  0.upto n do |j|
    cur[j] = j
  end

  1.upto m do |i|
    nxt[0] = i
    1.upto n do |j|
      # This is not unicode safe.
      if s[i - 1] == t[j - 1]
        nxt[j] = cur[j - 1]
      else
        nxt[j] = [
          cur[j] + 1,      #deletion
          nxt[j - 1] + 1,  #insertion
          cur[j - 1] + 1,  #substitution
        ].min
      end
    end
    tmp = cur
    cur = nxt
    nxt = tmp
  end
  
  return cur[n]
end