tfidf {textir} | R Documentation |
term frequency, inverse document frequency
tfidf(x,normalize=TRUE)
x |
A |
normalize |
Whether to normalize term frequency by document totals. |
A matrix of the same type as x
, with values replaced by the tf-idf
f_{ij} * \log[n/(d_j+1)],
where f_{ij}
is x_{ij}/m_i
or x_{ij}
, depending on normalize
,
and d_j
is the number of documents containing token j
.
Matt Taddy taddy@chicagobooth.edu
pls, we8there
data(we8there)
## 20 high-variance tf-idf terms
colnames(we8thereCounts)[
order(-sdev(tfidf(we8thereCounts)))[1:20]]