kfold {jagshelper} | R Documentation |
Automated K-fold or Leave One Out Cross Validation
Description
Runs k-fold or Leave One Out Cross Validation for a specified component of a JAGS data object, for a specified JAGS model.
JAGS is run internally k
times (or alternately, the size of the dataset),
withholding each of k
"folds" of the input data and drawing posterior predictive
samples corresponding to the withheld data, which can then be compared to the
input data to assess model predictive power.
Global measures of predictive power are provided in output: Root Mean Square (Prediction) Error and Mean Absolute (Prediction) Error. However, it is likely that these measures will not be meaningful by themselves; rather, as a metric for scoring a set of candidate models.
Usage
kfold(
model.file,
data,
p,
addl_p = NULL,
save_postpred = FALSE,
k = 10,
loocv = FALSE,
fold_dims = NULL,
...
)
Arguments
model.file |
Path to file containing the model written in BUGS code, passed directly to jags. |
data |
The named list of data objects, passed directly to jags. |
p |
The name of the data object to use for K-fold or LOO CV. |
addl_p |
Names of additional parameters to save from JAGS output,
if a metric such as Log Pointwise Predictive Density is to be calculated from
cross-validation results. Defaults to |
save_postpred |
Whether to save all posterior predictive samples,
in addition to posterior medians. Defaults to |
k |
How many folds to use for cross-validation. Defaults to |
loocv |
Whether to perform Leave One Out (rather than k-fold) Cross
Validation. Setting this to |
fold_dims |
A vector of margins to use for selecting folds, if the data
object used for cross validation is a matrix or array. For example, if the
data consists of a two-dimensional matrix, setting |
... |
additional arguments to jags. These may (or must)
include |
Value
A named list, which may consist of the following:
-
$pred_y
: Point estimates of predicted values corresponding to each data element, calculated as the posterior predictive median value -
$data_y
: Original data used for cross validation -
$postpred_y
: All posterior predictive samples corresponding to each data element, ifsave_postpred=TRUE
-
$rmse_pred
: Root Mean Square (Prediction) Error -
$mae_pred
: Mean Absolute (Prediction) Error -
$addl_p
: A list with length equal tok
(or the number of folds), with each list element containing all posterior samples for additional parameters, if these are supplied in argumentaddl_p=
. -
$fold
: A vector, matrix, or array corresponding to the original data, giving the numerical values of the corresponding fold used
Author(s)
Matt Tyers
See Also
qq_postpred, plot_postpred, plotRhats, traceworstRhat
Examples
#### test case where y is a matrix
asdf_jags <- tempfile()
cat('model {
for(i in 1:n) {
for(j in 1:ngrp) {
y[i,j] ~ dnorm(mu[i,j], tau)
mu[i,j] <- b0 + b1*x[i,j] + a[j]
}
}
for(j in 1:ngrp) {
a[j] ~ dnorm(0, tau_a)
}
tau <- pow(sig, -2)
sig ~ dunif(0, 10)
b0 ~ dnorm(0, 0.001)
b1 ~ dnorm(0, 0.001)
tau_a <- pow(sig_a, -2)
sig_a ~ dunif(0, 10)
}', file=asdf_jags)
# simulate data to go with the example model
n <- 45
x <- matrix(rnorm(n, sd=3),
nrow=20, ncol=3)
y <- matrix(rnorm(n, mean=rep(1:3, each=20)-x),
nrow=20, ncol=3)
asdf_data <- list(x=x,
y=y,
n=nrow(x),
ngrp=ncol(x))
# JAGS controls
niter <- 1000
ncores <- 2
# ncores <- min(10, parallel::detectCores()-1)
## random assignment of folds
kfold1 <- kfold(p="y",
k=5,
model.file=asdf_jags, data=asdf_data,
n.chains=ncores, n.iter=niter,
n.burnin=niter/2, n.thin=niter/1000,
parallel=FALSE)
str(kfold1)
kfold1$fold
## Performing LOOCV, but assigning folds by row of input data
kfold2 <- kfold(p="y",
loocv=TRUE, fold_dims=1,
model.file=asdf_jags, data=asdf_data,
n.chains=ncores, n.iter=niter,
n.burnin=niter/2, n.thin=niter/1000,
parallel=FALSE)
str(kfold2)
kfold2$fold