predict.BayesECM {ezECM}R Documentation

New Event Categorization With Bayesian Inference

Description

New Event Categorization With Bayesian Inference

Usage

## S3 method for class 'BayesECM'
predict(object, Ytilde, thinning = 1, mixture_weights = "training", ...)

Arguments

object

an object of class "BayesECM" obtained as the trained model output using the BayesECM() function.

Ytilde

data.frame of unlabeled observations to be categorized. Must contain the same discriminant names as the training data used in the provided "BayesECM" object. Each row is an individual observation. Missing data is specified with NA

thinning

integer, scalar. Values greater than one can be provided to reduce computation time. See details.

mixture_weights

character string describing the weights of the distributions in the mixture to be used for prediction. The default,"training" will utilize weights according to likelihood and prior specifications, while supplying the string "equal" will assume the marginal predictive distribution of each category is independent of the data and utilize equal weights.

...

not used

Details

The data in Ytilde should be the p-values \in (0,1]. The transformation applied to the data used to generate object is automatically applied to Ytilde within the predict.BayesECM() function.

For a given event with an unknown category, a Bayesian ECM model seeks to predict the expected value of the latent variable \tilde{\mathbf{z}}_K, where \tilde{\mathbf{z}}_K is a vector of the length K, and K is the number of event categories. A single observation of \tilde{\mathbf{z}}_K is a draw from a Categorical Distribution.

The expected probabilities stipulated within the categorical distribution of \tilde{\mathbf{z}}_K are conditioned on any imputed missing data, prior hyperparameters, and individually each row of Ytilde. The output from predict.BayesECM() are draws from the distribution of \mathbf{E}[\tilde{\mathbf{z}}_K|\tilde{\mathbf{y}}_{\tilde{p}}, \mathbf{Y}^{+}, \mathbf{\eta}, \mathbf{\Psi}, \mathbf{\nu}, \mathbf{\alpha}] = p(\tilde{\mathbf{z}}_K|\tilde{\mathbf{y}}_{\tilde{p}}, \mathbf{Y}^{+}, \mathbf{\eta}, \mathbf{\Psi}, \mathbf{\nu}, \mathbf{\alpha}), where \mathbf{Y}^{+} represents the observed values within the training data.

The argument mixture_weights controls the value of p(\tilde{\mathbf{z}}_K|\mathbf{Y}_{N \times p}, \mathbf{\alpha}), the probability of each \tilde{z}_k = 1, before \tilde{\mathbf{y}}_{\tilde{p}} is observed. The standard result is obtained from the prior hyperparameter values in \mathbf{\alpha} and the number of unique events in each \mathbf{Y}_{N_k \times p}. Setting mixture_weights = "training" will utilize this standard result in prediction. If the frequency of the number events used for each category in training is thought to be problematic, providing the argument mixture_weights = "equal" sets p(\tilde{z}_1 = 1|\mathbf{Y}_{N \times p}) = \dots = p(\tilde{z}_K = 1|\mathbf{Y}_{N \times p}) = 1/K. If the user wants to use a set of p(\tilde{z}_k = 1|\mathbf{Y}_{N \times p}) which are not equal but also not informed by the data, we suggest setting the elements of the hyperparameter vector \mathbf{\alpha} equal to values with a large magnitude and in the desired ratios for each category. However, this can cause undesirable results in prediction if the magnitude of some elements of \mathbf{\alpha} are orders larger than others.

To save computation time, the user can specify an integer value for thinning greater than one. Every thinningth Markov-chain Monte-Carlo sample is used for prediction. This lets the user take a large number of samples during the training step, allowing for better mixing. See details in a package vignette by running vignette("syn-data-code", package = "ezECM")

Value

Returns a list. The list element epz is a matrix with nrow(Ytilde) rows, corresponding to each event used for prediction, and K named columns. Each column of epz is the expected category probability of the row stipulated event. The remainder of the list elements hold data including Ytilde, information about additonal variables passed to predict.BayesECM, and data related to the previous BayesECM() fit.

Examples


csv_use <- "good_training.csv"
file_path <- system.file("extdata", csv_use, package = "ezECM")
training_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE)

trained_model <- BayesECM(Y = training_data, BT = c(10,1000))

csv_use <- "good_newdata.csv"
file_path <- system.file("extdata", csv_use, package = "ezECM")
new_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE)

bayes_pred <- predict(trained_model,  Ytilde = new_data)



[Package ezECM version 1.0.0 Index]