nlp_melt_tokens {textpress} | R Documentation |
Tokenize Data Frame by Specified Column(s)
Description
This function tokenizes a data frame based on a specified token column and groups the data by one or more specified columns.
Usage
nlp_melt_tokens(
df,
melt_col = "token",
parent_cols = c("doc_id", "sentence_id")
)
Arguments
df |
A data frame containing the data to be tokenized. |
melt_col |
The name of the column in 'df' that contains the tokens. |
parent_cols |
A character vector indicating the column(s) by which to group the data. |
Value
A list of vectors, each containing the tokens of a group defined by the 'by' parameter.
Examples
dtm <- data.frame(doc_id = as.character(c(1, 1, 1, 1, 1, 1, 1, 1)),
sentence_id = as.character(c(1, 1, 1, 2, 2, 2, 2, 2)),
token = c("Hello", "world", ".", "This", "is", "an", "example", "."))
tokens <- nlp_melt_tokens(dtm, melt_col = 'token', parent_cols = c('doc_id', 'sentence_id'))
[Package textpress version 1.0.0 Index]