Class Nysiis

java.lang.Object
org.apache.commons.codec.language.Nysiis
All Implemented Interfaces:
Encoder, StringEncoder

public class Nysiis extends Object implements StringEncoder
Encodes a string into a NYSIIS value. NYSIIS is an encoding used to relate similar names, but can also be used as a general purpose scheme to find word with similar phonemes.

NYSIIS features an accuracy increase of 2.7% over the traditional Soundex algorithm.

Algorithm description:

 1. Transcode first characters of name
   1a. MAC ->   MCC
   1b. KN  ->   NN
   1c. K   ->   C
   1d. PH  ->   FF
   1e. PF  ->   FF
   1f. SCH ->   SSS
 2. Transcode last characters of name
   2a. EE, IE          ->   Y
   2b. DT,RT,RD,NT,ND  ->   D
 3. First character of key = first character of name
 4. Transcode remaining characters by following these rules, incrementing by one character each time
   4a. EV  ->   AF  else A,E,I,O,U -> A
   4b. Q   ->   G
   4c. Z   ->   S
   4d. M   ->   N
   4e. KN  ->   N   else K -> C
   4f. SCH ->   SSS
   4g. PH  ->   FF
   4h. H   ->   If previous or next is non-vowel, previous
   4i. W   ->   If previous is vowel, previous
   4j. Add current to key if current != last key character
 5. If last character is S, remove it
 6. If last characters are AY, replace with Y
 7. If last character is A, remove it
 8. Collapse all strings of repeated characters
 9. Add original first character of name as first character of key
 

This class is immutable and thread-safe.

Since:
1.7
See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private static final char[]
     
    private static final char[]
     
    private static final char[]
     
    private static final char[]
     
    private static final char[]
     
    private static final char[]
     
    private static final char[]
     
    private static final char[]
     
    private static final char[]
     
    private static final Pattern
     
    private static final Pattern
     
    private static final Pattern
     
    private static final Pattern
     
    private static final Pattern
     
    private static final Pattern
     
    private static final Pattern
     
    private static final char
     
    private final boolean
    Indicates the strict mode.
    private static final int
     
  • Constructor Summary

    Constructors
    Constructor
    Description
    Creates an instance of the Nysiis encoder with strict mode (original form), i.e.
    Nysiis(boolean strict)
    Create an instance of the Nysiis encoder with the specified strict mode: true: encoded strings have a maximum length of 6 false: encoded strings may have arbitrary length
  • Method Summary

    Modifier and Type
    Method
    Description
    Encodes an Object using the NYSIIS algorithm.
    Encodes a String using the NYSIIS algorithm.
    boolean
    Indicates the strict mode for this Nysiis encoder.
    private static boolean
    isVowel(char c)
    Tests if the given character is a vowel.
    Retrieves the NYSIIS code for a given String object.
    private static char[]
    transcodeRemaining(char prev, char curr, char next, char aNext)
    Transcodes the remaining parts of the String.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • CHARS_A

      private static final char[] CHARS_A
    • CHARS_AF

      private static final char[] CHARS_AF
    • CHARS_C

      private static final char[] CHARS_C
    • CHARS_FF

      private static final char[] CHARS_FF
    • CHARS_G

      private static final char[] CHARS_G
    • CHARS_N

      private static final char[] CHARS_N
    • CHARS_NN

      private static final char[] CHARS_NN
    • CHARS_S

      private static final char[] CHARS_S
    • CHARS_SSS

      private static final char[] CHARS_SSS
    • PAT_MAC

      private static final Pattern PAT_MAC
    • PAT_KN

      private static final Pattern PAT_KN
    • PAT_K

      private static final Pattern PAT_K
    • PAT_PH_PF

      private static final Pattern PAT_PH_PF
    • PAT_SCH

      private static final Pattern PAT_SCH
    • PAT_EE_IE

      private static final Pattern PAT_EE_IE
    • PAT_DT_ETC

      private static final Pattern PAT_DT_ETC
    • SPACE

      private static final char SPACE
      See Also:
    • TRUE_LENGTH

      private static final int TRUE_LENGTH
      See Also:
    • strict

      private final boolean strict
      Indicates the strict mode.
  • Constructor Details

    • Nysiis

      public Nysiis()
      Creates an instance of the Nysiis encoder with strict mode (original form), i.e. encoded strings have a maximum length of 6.
    • Nysiis

      public Nysiis(boolean strict)
      Create an instance of the Nysiis encoder with the specified strict mode:
      • true: encoded strings have a maximum length of 6
      • false: encoded strings may have arbitrary length
      Parameters:
      strict - the strict mode
  • Method Details

    • isVowel

      private static boolean isVowel(char c)
      Tests if the given character is a vowel.
      Parameters:
      c - the character to test
      Returns:
      true if the character is a vowel, false otherwise
    • transcodeRemaining

      private static char[] transcodeRemaining(char prev, char curr, char next, char aNext)
      Transcodes the remaining parts of the String. The method operates on a sliding window, looking at 4 characters at a time: [i-1, i, i+1, i+2].
      Parameters:
      prev - the previous character
      curr - the current character
      next - the next character
      aNext - the after next character
      Returns:
      a transcoded array of characters, starting from the current position
    • encode

      public Object encode(Object obj) throws EncoderException
      Encodes an Object using the NYSIIS algorithm. This method is provided in order to satisfy the requirements of the Encoder interface, and will throw an EncoderException if the supplied object is not of type String.
      Specified by:
      encode in interface Encoder
      Parameters:
      obj - Object to encode
      Returns:
      An object (or a String) containing the NYSIIS code which corresponds to the given String.
      Throws:
      EncoderException - if the parameter supplied is not of a String
      IllegalArgumentException - if a character is not mapped
    • encode

      public String encode(String str)
      Encodes a String using the NYSIIS algorithm.
      Specified by:
      encode in interface StringEncoder
      Parameters:
      str - A String object to encode
      Returns:
      A Nysiis code corresponding to the String supplied
      Throws:
      IllegalArgumentException - if a character is not mapped
    • isStrict

      public boolean isStrict()
      Indicates the strict mode for this Nysiis encoder.
      Returns:
      true if the encoder is configured for strict mode, false otherwise
    • nysiis

      public String nysiis(String str)
      Retrieves the NYSIIS code for a given String object.
      Parameters:
      str - String to encode using the NYSIIS algorithm
      Returns:
      A NYSIIS code for the String supplied