Class LevenshteinAutomata

java.lang.Object
org.apache.lucene.util.automaton.LevenshteinAutomata

public class LevenshteinAutomata extends Object
Class to construct DFAs that match a word within some edit distance.

Implements the algorithm described in: Schulz and Mihov: Fast String Correction with Levenshtein Automata

  • Field Details

    • MAXIMUM_SUPPORTED_DISTANCE

      public static final int MAXIMUM_SUPPORTED_DISTANCE
      Maximum edit distance this class can generate an automaton for.
      See Also:
    • word

      final int[] word
    • alphabet

      final int[] alphabet
    • alphaMax

      final int alphaMax
    • rangeLower

      final int[] rangeLower
    • rangeUpper

      final int[] rangeUpper
    • numRanges

      int numRanges
    • descriptions

  • Constructor Details

    • LevenshteinAutomata

      public LevenshteinAutomata(String input, boolean withTranspositions)
      Create a new LevenshteinAutomata for some input String. Optionally count transpositions as a primitive edit.
    • LevenshteinAutomata

      public LevenshteinAutomata(int[] word, int alphaMax, boolean withTranspositions)
      Expert: specify a custom maximum possible symbol (alphaMax); default is Character.MAX_CODE_POINT.
  • Method Details

    • codePoints

      private static int[] codePoints(String input)
    • toAutomaton

      public Automaton toAutomaton(int n)
      Compute a DFA that accepts all strings within an edit distance of n.

      All automata have the following properties:

      • They are deterministic (DFA).
      • There are no transitions to dead states.
      • They are not minimal (some transitions could be combined).
    • toAutomaton

      public Automaton toAutomaton(int n, String prefix)
      Compute a DFA that accepts all strings within an edit distance of n, matching the specified exact prefix.

      All automata have the following properties:

      • They are deterministic (DFA).
      • There are no transitions to dead states.
      • They are not minimal (some transitions could be combined).
    • getVector

      int getVector(int x, int pos, int end)
      Get the characteristic vector X(x, V) where V is substring(pos, end)