ngrams_dictionary {NUSS} | R Documentation |
ngrams_dictionary
returns the data.frame containing dictionary for
ngrams_segmentation.
ngrams_dictionary(
texts,
clean = TRUE,
ngram_min = 1,
ngram_max = 5,
points_filter = 1
)
texts |
character vector, these are the texts used to create n-grams dictionary. Case-sensitive. |
clean |
logical, indicating if the texts should be cleaned before creating n-grams dictionary. |
ngram_min |
numeric, sets the minimum number of words in creating the dictionary. |
ngram_max |
numeric, sets the maximum number of words in creating the dictionary. |
points_filter |
numeric, sets the minimal number of points (occurrences) of an n-gram to be included in the dictionary. |
The output always will be data.frame with 4 columns: 1) to_search, 2) to_replace, 3) id, 4) points.
texts <- c("this is science",
"science is #fascinatingthing",
"this is a scientific approach",
"science is everywhere",
"the beauty of science")
ngrams_dictionary(texts)
ngrams_dictionary(texts,
clean = FALSE)
ngrams_dictionary(texts,
clean = TRUE,
ngram_min = 2,
ngram_max = 2)