dtmwrappers {ngramrr} | R Documentation |
Wrappers to DocumentTermMatrix
and DocumentTermMatrix
to use n-gram tokenization provided by ngramrr
.
dtm2(x, char = FALSE, ngmin = 1, ngmax = 2, rmEOL = TRUE, ...)
tdm2(x, char = FALSE, ngmin = 1, ngmax = 2, rmEOL = TRUE, ...)
x |
character vector, |
char |
logical, using character n-gram. char = FALSE denotes word n-gram. |
ngmin |
integer, minimun order of n-gram |
ngmax |
integer, maximun order of n-gram |
rmEOL |
logical, remove ngrams wih EOL character |
... |
Additional options for |
DocumentTermMatrix
or DocumentTermMatrix
ngramrr
, DocumentTermMatrix
, TermDocumentMatrix
nirvana <- c("hello hello hello how low", "hello hello hello how low",
"hello hello hello how low", "hello hello hello",
"with the lights out", "it's less dangerous", "here we are now", "entertain us",
"i feel stupid", "and contagious", "here we are now", "entertain us",
"a mulatto", "an albino", "a mosquito", "my libido", "yeah", "hey yay")
dtm2(nirvana, ngmax = 3, removePunctuation = TRUE)