unigram_dictionary {NUSS} | R Documentation |
unigram_dictionary
returns the data.frame containing dictionary for
unigram_sequence_segmentation.
unigram_dictionary(texts, points_filter = 1)
texts |
character vector, these are the texts used to create ngrams dictionary. Case-sensitive. |
points_filter |
numeric, sets the minimal number of points (occurrences) of an unigram to be included in the dictionary. |
The output always will be data.frame with 4 columns: 1) to_search, 2) to_replace, 3) id, 4) points.
texts <- c("this is science",
"science is #fascinatingthing",
"this is a scientific approach",
"science is everywhere",
"the beauty of science")
unigram_dictionary(texts)