posteriorpredictive.pep {PEPBVS} | R Documentation |
Posterior predictive distribution under Bayesian model averaging
Description
Simulates values from the posterior predictive distribution under Bayesian model averaging.
Usage
posteriorpredictive.pep(
object,
xnew,
ssize = 10000,
estimator = "BMA",
n.models = NULL,
cumul.prob = 0.99
)
Arguments
object |
An object of class pep (e.g., output of |
xnew |
An optional data frame of numeric, the new data on the explanatory variables to be used for prediction. The data frame needs to contain information about all explanatory variables available in the full model; if not an error message is output. If omitted, the data frame employed for fitting the full model is used. |
ssize |
Positive integer, the number of values to be simulated from each posterior predictive distribution. Default value=10000. |
estimator |
A character, the type of prediction. One of
“BMA” (Bayesian model averaging, default),
“MAP” (maximum a posteriori model) or “MPM” (median probability model).
Default value= |
n.models |
Positive integer, the number of (top) models where
the average is based on or |
cumul.prob |
Numeric between zero and one, cumulative probability of
top models to be used for computing the average. Relevant for |
Details
For the computations, Equation 11 of Garcia–Donato and Forte (2018) is used. That (simplified) formula arises when changing the prior on the model parameters to the reference prior. This change of prior is justified in Garcia–Donato and Forte (2018). The resulting formula is a mixture distribution and the simulation is implemented as follows: firstly the model (component) based on its posterior probability is chosen and subsequently the value for the response is drawn from the corresponding Student distribution.
The case of missing data (i.e., presence of NA’s) and non–quantitative data
in the new data frame
xnew
is not currently supported.
Let k
be the number of models with cumulative posterior probability up
to the given value of cumul.prob
. Then, for Bayesian model averaging
the prediction is based on the top (k+1)
models if they exist, otherwise
on the top k
models.
When both n.models
and cumul.prob
are provided — once
specifying the number of models for the given cumulative probability as
described above — the minimum between the two numbers is used for prediction.
Value
posteriorpredictive.pep
returns a matrix (of dimension
ssize
\times
nrow(xnew)
) — containing the
simulated data. More specifically, column i
contains the simulated
values from the posterior predictive corresponding to the i
–th new
observation (i.e., i
–th row of xnew
).
References
Garcia–Donato, G. and Forte, A. (2018) Bayesian Testing, Variable Selection and Model Averaging in Linear Models using R with BayesVarSel. The R Journal, 10(1): 155–174. doi:10.32614/RJ-2018-021
Examples
data(UScrime_data)
X <- UScrime_data[,-15]
set.seed(123)
res <- pep.lm(y~.,data=UScrime_data[1:45,],intrinsic=TRUE,
algorithmic.choice="MC3",itermc3=4000)
resf <- posteriorpredictive.pep(res,ssize=2000,n.models=5)
resf2 <- posteriorpredictive.pep(res,ssize=2000,estimator="MPM")
resp <- posteriorpredictive.pep(res,xnew=X[46:47,],ssize=2000,n.models=5)