heldoutLikelihood {sts} | R Documentation |
Compute the heldout log-likelihood of the STS model
heldoutLikelihood(mv, kappa, alpha, missing)
mv |
the baseline log-transformed occurrence rate of each word in the corpus |
kappa |
the estimated kappa coefficients |
alpha |
the estimated alpha values for the corpus |
missing |
list of which words and documents are in the heldout set |
expected.heldout is the average of the held-out log-likelihood values for each document.
library("tm"); library("stm"); library("sts")
temp<-textProcessor(documents=gadarian$open.ended.response,
metadata=gadarian, verbose = FALSE)
out <- prepDocuments(temp$documents, temp$vocab, temp$meta, verbose = FALSE)
X <- model.matrix(~1+out$meta$treatment + out$meta$pid_rep +
out$meta$treatment * out$meta$pid_rep)[,-1]
X_seed <- as.matrix(out$meta$treatment)
out <- make.heldout(out$documents, out$vocab)
## low max iteration number just for testing
sts_estimate <- sts(X, X_seed, out, numTopics = 3, verbose = FALSE,
parallelize = FALSE, maxIter = 3, initialization = 'anchor')
sm <- sample(x=1:length(out$missing$index),
size = length(out$missing$index)*0.8, replace = TRUE)
d.h <- list(index = out$missing$index[sm], docs = out$missing$docs[sm])
heldoutLikelihood(mv=sts_estimate$mv, kappa=sts_estimate$kappa,
alpha=sts_estimate$alpha, missing=d.h)$expected.heldout