cv.customizedGlmnet {customizedTraining} | R Documentation |
Cross validation for customizedGlmnet
Description
Does k-fold cross-validation for customizedGlmnet and returns a values for G
and lambda
Usage
cv.customizedGlmnet(
xTrain,
yTrain,
xTest = NULL,
groupid = NULL,
Gs = NULL,
dendrogram = NULL,
dendrogramCV = NULL,
lambda = NULL,
nfolds = 10,
foldid = NULL,
keep = FALSE,
family = c("gaussian", "binomial", "multinomial"),
verbose = FALSE
)
Arguments
xTrain |
an n-by-p matrix of training covariates |
yTrain |
a length-n vector of training responses. Numeric for family = |
xTest |
an m-by-p matrix of test covariates. May be left NULL, in which case cross validation predictions are made internally on the training set and no test predictions are returned. |
groupid |
an optional length-m vector of group memberships for the test set. If
specified, customized training subsets are identified using the union of
nearest neighbor sets for each test group, in which case cross-validation is
used only to select the regularization parameter |
Gs |
a vector of positive integers indicating the numbers of clusters over which to
perform cross-validation to determine the best number. Ignored if |
dendrogram |
optional output from |
dendrogramCV |
optional output from |
lambda |
sequence of values to use for the regularization parameter lambda. Recomended
to leave as NULL and allow |
nfolds |
number of folds – default is 10. Ignored if foldid is specified |
foldid |
an optional length-n vector of fold memberships used for cross-validation |
keep |
Should fitted values on the training set from cross validation be included in output? Default is FALSE. |
family |
response type |
verbose |
Should progress be printed to console as folds are evaluated during cross-validation? Default is FALSE. |
Value
an object of class cv.customizedGlmnet
- call
the call that produced this object
- G.min
unless groupid is specified, the number of clusters minimizing CV error
- lambda
the sequence of values of the regularization parameter
lambda
considered- lambda.min
the value of the regularization parameter
lambda
minimizing CV error- error
a matrix containing the CV error for each
G
andlambda
- fit
a
customizedGlmnet
object fit usingG.min
andlambda.min
. Only returned ifxTest
is not NULL.- prediction
a length-m vector of predictions for the test set, using the tuning parameters which minimize cross-validation error. Only returned if
xTest
is not NULL.- selected
a list of nonzero variables for each customized training set, using
G.min
andlambda.min
. Only returned ifxTest
is not NULL.- cv.fit
a array containing fitted values on the training set from cross validation. Only returned if
keep
is TRUE.
Examples
require(glmnet)
# Simulate synthetic data
n = m = 150
p = 50
q = 5
K = 3
sigmaC = 10
sigmaX = sigmaY = 1
set.seed(5914)
beta = matrix(0, nrow = p, ncol = K)
for (k in 1:K) beta[sample(1:p, q), k] = 1
c = matrix(rnorm(K*p, 0, sigmaC), K, p)
eta = rnorm(K)
pi = (exp(eta)+1)/sum(exp(eta)+1)
z = t(rmultinom(m + n, 1, pi))
x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p)
y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY)
x.train = x[1:n, ]
y.train = y[1:n]
x.test = x[n + 1:m, ]
y.test = y[n + 1:m]
foldid = sample(rep(1:10, length = nrow(x.train)))
# Example 1: Use clustering to fit the customized training model to training
# and test data with no predefined test-set blocks
fit1 = cv.customizedGlmnet(x.train, y.train, x.test, Gs = c(1, 2, 3, 5),
family = "gaussian", foldid = foldid)
# Print the optimal number of groups and value of lambda:
fit1$G.min
fit1$lambda.min
# Print the customized training model fit:
fit1
# Compute test error using the predict function:
mean((y[n + 1:m] - predict(fit1))^2)
# Plot nonzero coefficients by group:
plot(fit1)
# Example 2: If the test set has predefined blocks, use these blocks to define
# the customized training sets, instead of using clustering.
foldid = apply(z == 1, 1, which)[1:n]
group.id = apply(z == 1, 1, which)[n + 1:m]
fit2 = cv.customizedGlmnet(x.train, y.train, x.test, group.id, foldid = foldid)
# Print the optimal value of lambda:
fit2$lambda.min
# Print the customized training model fit:
fit2
# Compute test error using the predict function:
mean((y[n + 1:m] - predict(fit2))^2)
# Plot nonzero coefficients by group:
plot(fit2)
# Example 3: If there is no test set, but the training set is organized into
# blocks, you can do cross validation with these blocks as the basis for the
# customized training sets.
fit3 = cv.customizedGlmnet(x.train, y.train, foldid = foldid)
# Print the optimal value of lambda:
fit3$lambda.min
# Print the customized training model fit:
fit3
# Compute test error using the predict function:
mean((y[n + 1:m] - predict(fit3))^2)
# Plot nonzero coefficients by group:
plot(fit3)