sts {sts} | R Documentation |
Estimation of the STS Model using variational EM.
The function takes sparse representation of a document-term matrix, covariates
for each document, and an integer number of topics and returns fitted model
parameters. See an overview of functions in the package here:
sts-package
sts(
X,
X_seed,
corpus,
numTopics,
maxIter = 100,
initialization = "stm",
estimation = "lasso",
verbose = TRUE,
parallelize = FALSE,
stmSeed = NULL
)
X |
Data frame of document-specific content covariates affect how much (prevalence) and the way in which a topic is discussed (sentiment-discourse). |
X_seed |
A vector of length equal to the corpus size. This is the key experimental variable (e.g., review rating or binary indicator of experiment/control group.). |
corpus |
The document term matrix to be modeled in a sparse term count matrix with one row
per document and one column per term. The object must be a list of with each element
corresponding to a document. Each document is represented
as an integer matrix with two rows, and columns equal to the number of unique
vocabulary words in the document. The first row contains the 1-indexed
vocabulary entry and the second row contains the number of times that term
appears. This is the same format in the |
numTopics |
A positive integer (of size 2 or greater) representing the desired number of topics. |
maxIter |
A positive integer representing the max number of VEM iterations allowed. |
initialization |
Character argument that allows the user to specify an initialization
method. The default choice, |
estimation |
A character input specifying how kappa should be estimated. |
verbose |
A logical flag indicating whether information should be printed to the screen. |
parallelize |
A logical flag indicating whether to parallelize the estimation using all but one CPU cores on your local machine. |
stmSeed |
A prefit STM model object to initialize the STS model. Note this is ignored unless initialization = "stm" |
This is the main function for estimating the Structural Topic and Sentiment-Discourse (STS) Model. Users provide a corpus of documents and a number of topics. Each word in a document comes from exactly one topic and each document is represented by the proportion of its words that come from each of the topics. The document-specific content covariates affect how much (prevalence) and the way in which a topic is discussed (sentiment-discourse).
An object of class sts
alpha |
Estimated prevalence and sentiment-discourse values for each document and topic |
gamma |
Estimated regression coefficients that determine prevalence and sentiment/discourse for each topic |
kappa |
Estimated kappa coefficients that determine sentiment-discourse and the topic-word distributions |
sigma_inv |
Inverse of the covariance matrix for the alpha parameters |
sigma |
Covariance matrix for the alpha parameters |
elbo |
the ELBO at each iteration of the estimation algorithm |
mv |
the baseline log-transformed occurrence rate of each word in the corpus |
runtime |
Time elapsed in seconds |
vocab |
Vocabulary vector used |
mu |
Mean (fitted) values for alpha based on document-level variables * estimated Gamma for each document |
Roberts, M., Stewart, B., Tingley, D., and Airoldi, E. (2013) "The structural topic model and applied social science." In Advances in Neural Information Processing Systems Workshop on Topic Models: Computation, Application, and Evaluation.
Roberts M., Stewart, B. and Airoldi, E. (2016) "A model of text for experimentation in the social sciences" Journal of the American Statistical Association.
Chen L. and Mankad, S. (forthcoming) "A Structural Topic and Sentiment-Discourse Model for Text Analysis" Management Science.
#An example using the Gadarian data from the stm package. From Raw text to
# fitted model using textProcessor() which leverages the tm Package
library("tm"); library("stm"); library("sts")
temp<-textProcessor(documents=gadarian$open.ended.response,
metadata=gadarian, verbose = FALSE)
out <- prepDocuments(temp$documents, temp$vocab, temp$meta, verbose = FALSE)
X <- model.matrix(~1+out$meta$treatment + out$meta$pid_rep +
out$meta$treatment * out$meta$pid_rep)[,-1]
X_seed <- as.matrix(out$meta$treatment)
## low max iteration number just for testing
sts_estimate <- sts(X, X_seed, out, numTopics = 3, verbose = FALSE,
parallelize = FALSE, maxIter = 3, initialization = 'anchor')