StaPLR {mvs} | R Documentation |
Stacked Penalized Logistic Regression
Description
Fit a two-level stacked penalized (logistic) regression model with a single base-learner and a single meta-learner. Stacked penalized regression models with a Gaussian or Poisson outcome can be fitted using the family argument.
Usage
StaPLR(
x,
y,
view,
view.names = NULL,
family = "binomial",
correct.for = NULL,
alpha1 = 0,
alpha2 = 1,
relax = FALSE,
nfolds = 10,
na.action = "fail",
na.arguments = NULL,
seed = NULL,
std.base = FALSE,
std.meta = FALSE,
ll1 = -Inf,
ul1 = Inf,
ll2 = 0,
ul2 = Inf,
cvloss = "deviance",
metadat = "response",
cvlambda = "lambda.min",
cvparallel = FALSE,
lambda.ratio = 1e-04,
fdev = 0,
penalty.weights.meta = NULL,
penalty.weights.base = NULL,
gamma.seq = c(0.5, 1, 2),
parallel = FALSE,
skip.version = TRUE,
skip.meta = FALSE,
skip.cv = FALSE,
progress = TRUE,
relax.base = FALSE,
relax.meta = FALSE
)
staplr(
x,
y,
view,
view.names = NULL,
family = "binomial",
correct.for = NULL,
alpha1 = 0,
alpha2 = 1,
relax = FALSE,
nfolds = 10,
na.action = "fail",
na.arguments = NULL,
seed = NULL,
std.base = FALSE,
std.meta = FALSE,
ll1 = -Inf,
ul1 = Inf,
ll2 = 0,
ul2 = Inf,
cvloss = "deviance",
metadat = "response",
cvlambda = "lambda.min",
cvparallel = FALSE,
lambda.ratio = 1e-04,
fdev = 0,
penalty.weights.meta = NULL,
penalty.weights.base = NULL,
gamma.seq = c(0.5, 1, 2),
parallel = FALSE,
skip.version = TRUE,
skip.meta = FALSE,
skip.cv = FALSE,
progress = TRUE,
relax.base = FALSE,
relax.meta = FALSE
)
Arguments
x |
input matrix of dimension nobs x nvars |
y |
outcome vector of length nobs |
view |
a vector of length nvars, where each entry is an integer describing to which view each feature corresponds. |
view.names |
(optional) a character vector of length nviews specifying a name for each view. |
family |
Either a character string representing one of the built-in families, or else a |
correct.for |
(optional) a matrix with nrow = nobs, where each column is a feature which should be included directly into the meta.learner. By default these features are not penalized (see penalty.weights.meta) and appear at the top of the coefficient list. |
alpha1 |
(base) alpha parameter for glmnet: lasso(1) / ridge(0) |
alpha2 |
(meta) alpha parameter for glmnet: lasso(1) / ridge(0) |
relax |
logical, whether relaxed lasso should be used at base and meta level. |
nfolds |
number of folds to use for all cross-validation. |
na.action |
character specifying what to do with missing values (NA). Options are "pass", "fail", "mean", "mice", and "missForest". Options "mice" and "missForest" requires the respective R package to be installed. Defaults to "pass". |
na.arguments |
(optional) a named list of arguments to pass to the imputation function (e.g. to |
seed |
(optional) numeric value specifying the seed. Setting the seed this way ensures the results are reproducible even when the computations are performed in parallel. |
std.base |
should features be standardized at the base level? |
std.meta |
should cross-validated predictions be standardized at the meta level? |
ll1 |
lower limit(s) for each coefficient at the base-level. Defaults to -Inf. |
ul1 |
upper limit(s) for each coefficient at the base-level. Defaults to Inf. |
ll2 |
lower limit(s) for each coefficient at the meta-level. Defaults to 0 (non-negativity constraints). Does not apply to correct.for features. |
ul2 |
upper limit(s) for each coefficient at the meta-level. Defaults to Inf. Does not apply to correct.for features. |
cvloss |
loss to use for cross-validation. |
metadat |
which attribute of the base learners should be used as input for the meta learner? Allowed values are "response", "link", and "class". |
cvlambda |
value of lambda at which cross-validated predictions are made. Defaults to the value giving minimum internal cross-validation error. |
cvparallel |
whether to use 'foreach' to fit each CV fold (DO NOT USE, USE OPTION parallel INSTEAD). |
lambda.ratio |
the ratio between the largest and smallest lambda value. |
fdev |
sets the minimum fractional change in deviance for stopping the path to the specified value, ignoring the value of fdev set through glmnet.control. Setting fdev=NULL will use the value set through glmnet.control instead. It is strongly recommended to use the default value of zero. |
penalty.weights.meta |
(optional) either a vector of length nviews containing different penalty factors for the meta-learner, or "adaptive" to calculate the weights from the data. The default value NULL implies an equal penalty for each view. The penalty factor is set to 0 for |
penalty.weights.base |
(optional) either a list of length nviews, where each entry is a vector containing different penalty factors for each feature in that view, or "adaptive" to calculate the weights from the data. The default value NULL implies an equal penalty for each view. Note that using adaptive weights at the base level is generally only sensible if |
gamma.seq |
a sequence of gamma values over which to optimize the adaptive weights. Only used when |
parallel |
whether to use foreach to fit the base-learners and obtain the cross-validated predictions in parallel. Executes sequentially unless a parallel backend is registered beforehand. |
skip.version |
whether to skip checking the version of the glmnet package. |
skip.meta |
whether to skip training the metalearner. |
skip.cv |
whether to skip generating the cross-validated predictions. |
progress |
whether to show a progress bar (only supported when parallel = FALSE). |
relax.base |
logical indicating whether relaxed lasso should be employed for fitting the base learners. If |
relax.meta |
logical indicating whether relaxed lasso should be employed for fitting the meta learner. If |
Value
An object with S3 class "StaPLR".
Author(s)
Wouter van Loon <w.s.van.loon@fsw.leidenuniv.nl>
Examples
set.seed(012)
n <- 1000
cors <- seq(0.1,0.7,0.1)
X <- matrix(NA, nrow=n, ncol=length(cors)+1)
X[,1] <- rnorm(n)
for(i in 1:length(cors)){
X[,i+1] <- X[,1]*cors[i] + rnorm(n, 0, sqrt(1-cors[i]^2))
}
beta <- c(1,0,0,0,0,0,0,0)
eta <- X %*% beta
p <- exp(eta)/(1+exp(eta))
y <- rbinom(n, 1, p) ## create binary response
view_index <- rep(1:(ncol(X)/2), each=2)
# Stacked penalized logistic regression
fit <- StaPLR(X, y, view_index)
coef(fit)$meta
new_X <- matrix(rnorm(16), nrow=2)
predict(fit, new_X)
# Stacked penalized linear regression
y <- eta + rnorm(100) ## create continuous response
fit <- StaPLR(X, y, view_index, family = "gaussian")
coef(fit)$meta
coef(fit)$base
new_X <- matrix(rnorm(16), nrow=2)
predict(fit, new_X)
# Stacked penalized Poisson regression
y <- ceiling(eta + 4) ## create count response
fit <- StaPLR(X, y, view_index, family = "poisson")
coef(fit)$meta
coef(fit)$base
new_X <- matrix(rnorm(16), nrow=2)
predict(fit, new_X)