Package org.apache.lucene.analysis.ko
Class KoreanTokenizerFactory
java.lang.Object
org.apache.lucene.analysis.AbstractAnalysisFactory
org.apache.lucene.analysis.TokenizerFactory
org.apache.lucene.analysis.ko.KoreanTokenizerFactory
- All Implemented Interfaces:
ResourceLoaderAware
Factory for
KoreanTokenizer
.
<fieldType name="text_ko" class="solr.TextField"> <analyzer> <tokenizer class="solr.KoreanTokenizerFactory" decompoundMode="discard" userDictionary="user.txt" userDictionaryEncoding="UTF-8" outputUnknownUnigrams="false" discardPunctuation="true" /> </analyzer> </fieldType>
Supports the following attributes:
- userDictionary: User dictionary path.
- userDictionaryEncoding: User dictionary encoding.
- decompoundMode: Decompound mode. Either 'none', 'discard', 'mixed'. Default is discard. See
KoreanTokenizer.DecompoundMode
- outputUnknownUnigrams: If true outputs unigrams for unknown words.
- discardPunctuation: true if punctuation tokens should be dropped from the output.
- Since:
- 7.4.0
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final String
private static final String
private final boolean
private final KoreanTokenizer.DecompoundMode
static final String
SPI nameprivate static final String
private final boolean
private static final String
private static final String
private UserDictionary
private final String
private final String
Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
-
Constructor Summary
ConstructorsConstructorDescriptionDefault ctor for compatibility with SPIKoreanTokenizerFactory
(Map<String, String> args) Creates a new KoreanTokenizerFactory -
Method Summary
Modifier and TypeMethodDescriptioncreate
(AttributeFactory factory) Creates a TokenStream of the specified input using the given AttributeFactoryvoid
inform
(ResourceLoader loader) Initializes this component with the provided ResourceLoader (used for loading classes, files, etc).Methods inherited from class org.apache.lucene.analysis.TokenizerFactory
availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizers
Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
Field Details
-
NAME
SPI name- See Also:
-
USER_DICT_PATH
- See Also:
-
USER_DICT_ENCODING
- See Also:
-
DECOMPOUND_MODE
- See Also:
-
OUTPUT_UNKNOWN_UNIGRAMS
- See Also:
-
DISCARD_PUNCTUATION
- See Also:
-
userDictionaryPath
-
userDictionaryEncoding
-
userDictionary
-
mode
-
outputUnknownUnigrams
private final boolean outputUnknownUnigrams -
discardPunctuation
private final boolean discardPunctuation
-
-
Constructor Details
-
KoreanTokenizerFactory
Creates a new KoreanTokenizerFactory -
KoreanTokenizerFactory
public KoreanTokenizerFactory()Default ctor for compatibility with SPI
-
-
Method Details
-
inform
Description copied from interface:ResourceLoaderAware
Initializes this component with the provided ResourceLoader (used for loading classes, files, etc).- Specified by:
inform
in interfaceResourceLoaderAware
- Throws:
IOException
-
create
Description copied from class:TokenizerFactory
Creates a TokenStream of the specified input using the given AttributeFactory- Specified by:
create
in classTokenizerFactory
-