unigram_dictionary {NUSS}R Documentation

Create unigram dictionary

Description

unigram_dictionary returns the data.frame containing dictionary for unigram_sequence_segmentation.

Usage

unigram_dictionary(texts, points_filter = 1)

Arguments

texts

character vector, these are the texts used to create ngrams dictionary. Case-sensitive.

points_filter

numeric, sets the minimal number of points (occurrences) of an unigram to be included in the dictionary.

Value

The output always will be data.frame with 4 columns: 1) to_search, 2) to_replace, 3) id, 4) points.

Examples

texts <- c("this is science",
           "science is #fascinatingthing",
           "this is a scientific approach",
           "science is everywhere",
           "the beauty of science")
unigram_dictionary(texts)


[Package NUSS version 0.1.0 Index]