All Implemented Interfaces:
Closeable, AutoCloseable

public class ThaiTokenizer extends SegmentingTokenizerBase
Tokenizer that use BreakIterator to tokenize Thai text.

WARNING: this tokenizer may not be supported by all JREs. It is known to work with Sun/Oracle and Harmony JREs. If your application needs to be fully portable, consider using ICUTokenizer instead, which uses an ICU Thai BreakIterator that will always be available.

  • Field Details

    • DBBI_AVAILABLE

      public static final boolean DBBI_AVAILABLE
      True if the JRE supports a working dictionary-based breakiterator for Thai. If this is false, this tokenizer will not work at all!
    • proto

      private static final BreakIterator proto
    • sentenceProto

      private static final BreakIterator sentenceProto
      used for breaking the text into sentences
    • wordBreaker

      private final BreakIterator wordBreaker
    • wrapper

      private final CharArrayIterator wrapper
    • sentenceStart

      int sentenceStart
    • sentenceEnd

      int sentenceEnd
    • termAtt

      private final CharTermAttribute termAtt
    • offsetAtt

      private final OffsetAttribute offsetAtt
  • Constructor Details

    • ThaiTokenizer

      public ThaiTokenizer()
      Creates a new ThaiTokenizer
    • ThaiTokenizer

      public ThaiTokenizer(AttributeFactory factory)
      Creates a new ThaiTokenizer, supplying the AttributeFactory
  • Method Details