cv {easy.glmnet} | R Documentation |
Function to easily cross-validate (including fold assignation, merging fold outputs, etc).
cv(x, y, family = c("binomial", "cox", "gaussian"), fit_fun, predict_fun, site = NULL,
covar = NULL, nfolds = 10, pred.format = NA, verbose = TRUE, ...)
x |
input matrix for glmnet of dimension nobs x nvars; each row is an observation vector. It can be easily obtained with |
y |
response to be predicted. A binary vector for "binomial", a "Surv" object for "cox", or a numeric vector for "gaussian". |
family |
distribution of y: "binomial", "cox", or "gaussian". |
fit_fun |
function to create the prediction model using the training subsets. It can have between two and four arguments(the first two are compulsory): |
predict_fun |
function to apply the prediction model to the test sets. It can have between two and four arguments (the first two are compulsory): |
site |
vector with the sites' names, or NULL for studies conducted in a single site. |
covar |
other covariates that can be passed to fit_fun and predict_fun. |
... |
other arguments that can be passed to fit_fun and predict_fun. |
nfolds |
number of folds, only used if |
pred.format |
format of the predictions returned by each fold. E.g., if the prediction is an array, use NA. |
verbose |
(optional) logical, whether to print some messages during execution. |
This function iteratively divides the dataset into a training dataset, with which fits the model using the function fit_fun
, and a test dataset, to which applies the model using the function predict_fun
. It saves the models fit with the training datasets and the predictions obtained in the test datasets. The fols are assigned automatically using assign.folds
, accounting for the site
is this is not null.
A list with the predictions and the models used.
Joaquim Radua
glmnet_predict
for obtaining predictions.
# Create random x (predictors) and y (binary)
x = matrix(rnorm(25000), ncol = 50)
y = 1 * (plogis(apply(x[,1:5], 1, sum) + rnorm(500, 0, 0.1)) > 0.5)
# Predict y via cross-validation
fit_fun = function (x_training, y_training) {
list(
lasso = glmnet_fit(x_training, y_training, family = "binomial")
)
}
predict_fun = function (m, x_test) {
glmnet_predict(m$lasso, x_test)
}
# Only 2 folds to ensure the example runs quickly
res = cv(x, y, family = "binomial", fit_fun = fit_fun, predict_fun = predict_fun, nfolds = 2)
# Show accuracy
se = mean(res$predictions$y.pred[res$predictions$y == 1] > 0.5)
sp = mean(res$predictions$y.pred[res$predictions$y == 0] < 0.5)
bac = (se + sp) / 2
cat("Sensitivity:", round(se, 2), "\n")
cat("Specificity:", round(sp, 2), "\n")
cat("Balanced accuracy:", round(bac, 2), "\n")