get_document_frequencies {dhlabR} | R Documentation |
This function obtains token frequencies within specified documents.
get_document_frequencies(pids, cutoff = 0, words = NULL)
pids |
A vector or data frame containing document IDs. |
cutoff |
A numeric value specifying the frequency cutoff for tokens. |
words |
A vector of words (tokens) to retrieve frequencies for. |
A list containing the following elements for each document:
Document ID
Token
Token frequency in the document
Total tokens in the document
document_ids <- c("URN:NBN:no-nb_digibok_2008051404065", "URN:NBN:no-nb_digibok_2010092120011")
frequency_cutoff <- 10
tokens <- c(".", ",", "men")
result <- get_document_frequencies(document_ids, frequency_cutoff, tokens)