Class Hunspell

java.lang.Object
org.apache.lucene.analysis.hunspell.Hunspell

public class Hunspell extends Object
A spell checker based on Hunspell dictionaries. This class can be used in place of native Hunspell for many languages for spell-checking and suggesting purposes. Note that not all languages are supported yet. For example:
  • Hungarian (as it doesn't only rely on dictionaries, but has some logic directly in the source code
  • Languages with Unicode characters outside of the Basic Multilingual Plane
  • PHONE affix file option for suggestions

The objects of this class are thread-safe.

  • Field Details

  • Constructor Details

    • Hunspell

      public Hunspell(Dictionary dictionary)
    • Hunspell

      public Hunspell(Dictionary dictionary, TimeoutPolicy policy, Runnable checkCanceled)
      Parameters:
      policy - a strategy determining what to do when API calls take too much time
      checkCanceled - an object that's periodically called, allowing to interrupt spell-checking or suggestion generation by throwing an exception
  • Method Details

    • spell

      public boolean spell(String word)
      Returns:
      whether the given word's spelling is considered correct according to Hunspell rules
    • spellClean

      private boolean spellClean(String word)
    • spellWithTrailingDots

      private boolean spellWithTrailingDots(String word)
    • checkWord

      boolean checkWord(String word)
    • checkSimpleWord

      Boolean checkSimpleWord(char[] wordChars, int length, WordCase originalCase)
    • checkWord

      private boolean checkWord(char[] wordChars, int length, WordCase originalCase)
    • checkCompounds

      private boolean checkCompounds(char[] wordChars, int length, WordCase originalCase)
    • findStem

      private Root<CharsRef> findStem(char[] wordChars, int offset, int length, WordCase originalCase, WordContext context)
    • acceptCase

      private boolean acceptCase(WordCase originalCase, int entryId, CharsRef root)
    • containsSharpS

      private boolean containsSharpS(char[] word, int offset, int length)
    • acceptsStem

      boolean acceptsStem(int formID)
    • checkCompounds

      private boolean checkCompounds(CharsRef word, WordCase originalCase, Hunspell.CompoundPart prev)
    • checkCompoundPatternReplacements

      private boolean checkCompoundPatternReplacements(CharsRef word, int pos, WordCase originalCase, Hunspell.CompoundPart prev)
    • checkCompoundsAfter

      private boolean checkCompoundsAfter(WordCase originalCase, Hunspell.CompoundPart prev)
    • hasForceUCaseProblem

      private boolean hasForceUCaseProblem(Root<?> root, WordCase originalCase, char[] wordChars)
    • getRoots

      public List<String> getRoots(String word)
      Find all roots that could result in the given word after case conversion and adding affixes. This corresponds to the original hunspell -s (stemming) functionality.

      Some affix rules are relaxed in this stemming process: e.g. explicitly forbidden words are still returned. Some of the returned roots may be synthetic and not directly occur in the *.dic file (but differ from some existing entries in case). No roots are returned for compound words.

      The returned roots may be used to retrieve morphological data via Dictionary.lookupEntries(java.lang.String).

    • mayBreakIntoCompounds

      private boolean mayBreakIntoCompounds(char[] chars, int offset, int length, int breakPos)
    • checkCompoundRules

      private boolean checkCompoundRules(char[] wordChars, int offset, int length, List<IntsRef> words)
    • checkLastCompoundPart

      private boolean checkLastCompoundPart(char[] wordChars, int start, int length, List<IntsRef> words)
    • isNumber

      private static boolean isNumber(String s)
    • isDigit

      private static boolean isDigit(char c)
    • tryBreaks

      private boolean tryBreaks(String word)
    • hasTooManyBreakOccurrences

      private boolean hasTooManyBreakOccurrences(String word)
    • canBeBrokenAt

      private boolean canBeBrokenAt(String word, String breakStr, int breakPos)
    • suggest

      public List<String> suggest(String word) throws SuggestionTimeoutException
      Returns:
      suggestions for the given misspelled word
      Throws:
      SuggestionTimeoutException - if the computation takes too long and TimeoutPolicy.THROW_EXCEPTION was specified in the constructor
    • suggest

      public List<String> suggest(String word, long timeLimitMs) throws SuggestionTimeoutException
      Parameters:
      word - the misspelled word to calculate suggestions for
      timeLimitMs - the duration limit in milliseconds, after which the associated TimeoutPolicy's effects (exception or partial result) may kick in
      Throws:
      SuggestionTimeoutException - if the computation takes too long and TimeoutPolicy.THROW_EXCEPTION was specified in the constructor
    • doSuggest

      private void doSuggest(String word, WordCase wordCase, LinkedHashSet<Suggestion> suggestions, Runnable checkCanceled)
    • checkTimeLimit

      private Runnable checkTimeLimit(String word, Set<Suggestion> suggestions, long timeLimitMs)
    • postprocess

      private List<String> postprocess(Collection<Suggestion> suggestions)
    • modifyChunksBetweenDashes

      private List<String> modifyChunksBetweenDashes(String word)