Class BinaryDictionary

java.lang.Object
org.apache.lucene.analysis.ja.dict.BinaryDictionary
All Implemented Interfaces:
Dictionary
Direct Known Subclasses:
TokenInfoDictionary, UnknownDictionary

public abstract class BinaryDictionary extends Object implements Dictionary
Base class for a binary-encoded in-memory dictionary.
  • Field Details

    • DICT_FILENAME_SUFFIX

      public static final String DICT_FILENAME_SUFFIX
      See Also:
    • TARGETMAP_FILENAME_SUFFIX

      public static final String TARGETMAP_FILENAME_SUFFIX
      See Also:
    • POSDICT_FILENAME_SUFFIX

      public static final String POSDICT_FILENAME_SUFFIX
      See Also:
    • DICT_HEADER

      public static final String DICT_HEADER
      See Also:
    • TARGETMAP_HEADER

      public static final String TARGETMAP_HEADER
      See Also:
    • POSDICT_HEADER

      public static final String POSDICT_HEADER
      See Also:
    • VERSION

      public static final int VERSION
      See Also:
    • buffer

      private final ByteBuffer buffer
    • targetMapOffsets

      private final int[] targetMapOffsets
    • targetMap

      private final int[] targetMap
    • posDict

      private final String[] posDict
    • inflTypeDict

      private final String[] inflTypeDict
    • inflFormDict

      private final String[] inflFormDict
    • HAS_BASEFORM

      public static final int HAS_BASEFORM
      flag that the entry has baseform data. otherwise it's not inflected (same as surface form)
      See Also:
    • HAS_READING

      public static final int HAS_READING
      flag that the entry has reading data. otherwise reading is surface form converted to katakana
      See Also:
    • HAS_PRONUNCIATION

      public static final int HAS_PRONUNCIATION
      flag that the entry has pronunciation data. otherwise pronunciation is the reading
      See Also:
  • Constructor Details

  • Method Details

    • populateTargetMap

      private static void populateTargetMap(DataInput in, int[] targetMap, int[] targetMapOffsets) throws IOException
      Throws:
      IOException
    • populatePosDict

      private static void populatePosDict(DataInput in, int posSize, String[] posDict, String[] inflTypeDict, String[] inflFormDict) throws IOException
      Throws:
      IOException
    • getResource

      @Deprecated(forRemoval=true, since="9.1") public static final InputStream getResource(BinaryDictionary.ResourceScheme scheme, String path) throws IOException
      Deprecated, for removal: This API element is subject to removal in a future version.
      Throws:
      IOException
    • lookupWordIds

      public void lookupWordIds(int sourceId, IntsRef ref)
    • getLeftId

      public int getLeftId(int wordId)
      Description copied from interface: Dictionary
      Get left id of specified word
      Specified by:
      getLeftId in interface Dictionary
      Returns:
      left id
    • getRightId

      public int getRightId(int wordId)
      Description copied from interface: Dictionary
      Get right id of specified word
      Specified by:
      getRightId in interface Dictionary
      Returns:
      right id
    • getWordCost

      public int getWordCost(int wordId)
      Description copied from interface: Dictionary
      Get word cost of specified word
      Specified by:
      getWordCost in interface Dictionary
      Returns:
      word's cost
    • getBaseForm

      public String getBaseForm(int wordId, char[] surfaceForm, int off, int len)
      Description copied from interface: Dictionary
      Get base form of word
      Specified by:
      getBaseForm in interface Dictionary
      Parameters:
      wordId - word ID of token
      Returns:
      Base form (only different for inflected words, otherwise null)
    • getReading

      public String getReading(int wordId, char[] surface, int off, int len)
      Description copied from interface: Dictionary
      Get reading of tokens
      Specified by:
      getReading in interface Dictionary
      Parameters:
      wordId - word ID of token
      Returns:
      Reading of the token
    • getPartOfSpeech

      public String getPartOfSpeech(int wordId)
      Description copied from interface: Dictionary
      Get Part-Of-Speech of tokens
      Specified by:
      getPartOfSpeech in interface Dictionary
      Parameters:
      wordId - word ID of token
      Returns:
      Part-Of-Speech of the token
    • getPronunciation

      public String getPronunciation(int wordId, char[] surface, int off, int len)
      Description copied from interface: Dictionary
      Get pronunciation of tokens
      Specified by:
      getPronunciation in interface Dictionary
      Parameters:
      wordId - word ID of token
      Returns:
      Pronunciation of the token
    • getInflectionType

      public String getInflectionType(int wordId)
      Description copied from interface: Dictionary
      Get inflection type of tokens
      Specified by:
      getInflectionType in interface Dictionary
      Parameters:
      wordId - word ID of token
      Returns:
      inflection type, or null
    • getInflectionForm

      public String getInflectionForm(int wordId)
      Description copied from interface: Dictionary
      Get inflection form of tokens
      Specified by:
      getInflectionForm in interface Dictionary
      Parameters:
      wordId - word ID of token
      Returns:
      inflection form, or null
    • baseFormOffset

      private static int baseFormOffset(int wordId)
    • readingOffset

      private int readingOffset(int wordId)
    • pronunciationOffset

      private int pronunciationOffset(int wordId)
    • hasBaseFormData

      private boolean hasBaseFormData(int wordId)
    • hasReadingData

      private boolean hasReadingData(int wordId)
    • hasPronunciationData

      private boolean hasPronunciationData(int wordId)
    • readString

      private String readString(int offset, int length, boolean kana)