java.lang.Object
org.apache.lucene.analysis.morph.Viterbi<Token,Viterbi.Position>
org.apache.lucene.analysis.ko.Viterbi
Viterbi
subclass for Korean morphological analysis.-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.analysis.morph.Viterbi
Viterbi.Position, Viterbi.WrappedPositionArray<U extends Viterbi.Position>
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final CharacterDefinition
private final EnumMap
<TokenType, Dictionary<? extends KoMorphData>> private final boolean
private GraphvizFormatter
<KoMorphData> private final KoreanTokenizer.DecompoundMode
private final boolean
private final UnknownDictionary
Fields inherited from class org.apache.lucene.analysis.morph.Viterbi
buffer, costs, enableSpacePenaltyFactor, end, lastBackTracePos, MAX_UNKNOWN_WORD_LENGTH, outputLongestUserEntryOnly, outputNBest, pending, pos, positions, VERBOSE, wordIdRef
-
Constructor Summary
ConstructorsConstructorDescriptionViterbi
(TokenInfoFST fst, FST.BytesReader fstReader, TokenInfoDictionary dictionary, TokenInfoFST userFST, FST.BytesReader userFSTReader, UserDictionary userDictionary, ConnectionCosts costs, UnknownDictionary unkDictionary, CharacterDefinition characterDefinition, boolean discardPunctuation, KoreanTokenizer.DecompoundMode mode, boolean outputUnknownUnigrams) -
Method Summary
Modifier and TypeMethodDescriptionprotected void
backtrace
(Viterbi.Position endPosData, int fromIDX) Backtrace from the provided position, back to the last time we back-traced, accumulating the resulting tokens to the pending list.protected int
computeSpacePenalty
(MorphData morphData, int wordID, int numSpaces) Returns the space penalty associated with the providedPOS.Tag
.(package private) Dictionary
<? extends KoMorphData> private static boolean
private static boolean
isPunctuation
(char ch) private static boolean
isPunctuation
(char ch, int cid) private static boolean
isSameScript
(Character.UnicodeScript scriptOne, Character.UnicodeScript scriptTwo) Determine if two scripts are compatible.protected int
processUnknownWord
(boolean anyMatches, Viterbi.Position posData) Add unknown words to the position graph.(package private) void
private boolean
shouldFilterToken
(Token token) Methods inherited from class org.apache.lucene.analysis.morph.Viterbi
add, backtraceNBest, computePenalty, fixupPendingList, forward, getPending, getPos, isEnd, isOutputNBest, resetBuffer, resetState, shouldSkipProcessUnknownWord
-
Field Details
-
dictionaryMap
-
unkDictionary
-
characterDefinition
-
discardPunctuation
private final boolean discardPunctuation -
mode
-
outputUnknownUnigrams
private final boolean outputUnknownUnigrams -
dotOut
-
-
Constructor Details
-
Viterbi
Viterbi(TokenInfoFST fst, FST.BytesReader fstReader, TokenInfoDictionary dictionary, TokenInfoFST userFST, FST.BytesReader userFSTReader, UserDictionary userDictionary, ConnectionCosts costs, UnknownDictionary unkDictionary, CharacterDefinition characterDefinition, boolean discardPunctuation, KoreanTokenizer.DecompoundMode mode, boolean outputUnknownUnigrams)
-
-
Method Details
-
processUnknownWord
Description copied from class:Viterbi
Add unknown words to the position graph.- Specified by:
processUnknownWord
in classViterbi<Token,
Viterbi.Position> - Returns:
- word length
- Throws:
IOException
-
setGraphvizFormatter
-
backtrace
Description copied from class:Viterbi
Backtrace from the provided position, back to the last time we back-traced, accumulating the resulting tokens to the pending list. The pending list is then in-reverse (last token should be returned first).- Specified by:
backtrace
in classViterbi<Token,
Viterbi.Position>
-
computeSpacePenalty
Returns the space penalty associated with the providedPOS.Tag
.- Overrides:
computeSpacePenalty
in classViterbi<Token,
Viterbi.Position>
-
getDict
-
shouldFilterToken
-
isPunctuation
private static boolean isPunctuation(char ch) -
isPunctuation
private static boolean isPunctuation(char ch, int cid) -
isCommonOrInherited
-
isSameScript
private static boolean isSameScript(Character.UnicodeScript scriptOne, Character.UnicodeScript scriptTwo) Determine if two scripts are compatible.
-