nuss {NUSS} | R Documentation |
nuss
returns the data.frame containing
hashtag, its segmented version, ids of dictionary words,
number of words it have taken to segment the hashtag,
total number of points, and computed score.
nuss(sequences, texts)
sequences |
character vector, sequence to be segmented, (e.g., hashtag) or without it. Case-insensitive. |
texts |
character vector, these are the texts used to create n-grams and unigram dictionary. Case-insensitive. |
This function is an arbitrary combination of ngrams_dictionary, unigram_dictionary, ngrams_segmentation, unigram_sequence_segmentation, created to easily segment short texts based on text corpus.
The output always will be data.frame with sequences, that were
The output is not in the input order. If needed, use
lapply
texts <- c("this is science",
"science is #fascinatingthing",
"this is a scientific approach",
"science is everywhere",
"the beauty of science")
nuss(c("thisisscience", "scienceisscience"), texts)