new_tidylda {tidylda} | R Documentation |
tidylda
Since all three of tidylda
,
refit.tidylda
, and
predict.tidylda
call fit_lda_c
,
we need a way to format the resulting posteriors and other user-facing
objects consistently. This function does that.
new_tidylda(
lda,
dtm,
burnin,
is_prediction = FALSE,
alpha = NULL,
eta = NULL,
optimize_alpha = NULL,
calc_r2 = NULL,
calc_likelihood = NULL,
call = NULL,
threads
)
lda |
list output of |
dtm |
a document term matrix or term co-occurrence matrix of class |
burnin |
integer number of burnin iterations. |
is_prediction |
is this for a prediction (as opposed to initial fitting,
or update)? Defaults to |
alpha |
output of |
eta |
output of |
optimize_alpha |
did you optimize |
calc_r2 |
did the user want to calculate R-squared when calculating the
the model? If |
calc_likelihood |
did you calculate the log likelihood when making a call
to |
call |
the result of calling |
threads |
number of parallel threads |
Returns an S3 object of class tidylda
with the following slots:
beta
is a numeric matrix whose rows are the posterior estimates
of P(token|topic)
theta
is a numeric matrix whose rows are the posterior estimates of
P(topic|document)
lambda
is a numeric matrix whose rows are the posterior estimates of
P(topic|token), calculated using Bayes's rule.
See calc_lambda
.
alpha
is the prior for topics over documents. If optimize_alpha
is FALSE
, alpha
is what the user passed when calling
tidylda
. If optimize_alpha
is TRUE
,
alpha
is a numeric vector returned in the alpha
slot from a
call to fit_lda_c
.
eta
is the prior for tokens over topics. This is what the user passed
when calling tidylda
.
summary
is the result of a call to summarize_topics
call
is the result of match.call
called at the top
of tidylda
log_likelihood
is a tibble
whose columns are
the iteration and log likelihood at that iteration. This slot is only populated
if calc_likelihood = TRUE
r2
is a numeric scalar resulting from a call to
calc_rsquared
. This slot only populated if
calc_r2 = TRUE
In general, the arguments of this function should be what the user passed
when calling tidylda
.
burnin
is used only to determine whether or not burn in iterations
were used when fitting the model. If burnin > -1
then posteriors
are calculated using lda$Cd_mean
and lda$Cv_mean
respectively.
Otherwise, posteriors are calculated using lda$Cd_mean
and
lda$Cv_mean
.
The class of call
isn't checked. It's just passed through to the
object returned by this function. Might be useful if you are using this
function for troubleshooting or something.