sparseR {sparseR} | R Documentation |
Fit a ranked-sparsity model with regularized regression
sparseR(
formula,
data,
family = c("gaussian", "binomial", "poisson", "coxph"),
penalty = c("lasso", "MCP", "SCAD"),
alpha = 1,
ncvgamma = 3,
lambda.min = 0.005,
k = 1,
poly = 1,
gamma = 0.5,
cumulative_k = FALSE,
cumulative_poly = TRUE,
pool = FALSE,
ia_formula = NULL,
pre_process = TRUE,
model_matrix = NULL,
y = NULL,
poly_prefix = "_poly_",
int_sep = "\\:",
pre_proc_opts = c("knnImpute", "scale", "center", "otherbin", "none"),
filter = c("nzv", "zv"),
extra_opts = list(),
...
)
formula |
Names of the terms |
data |
Data |
family |
The family of the model |
penalty |
What penalty should be used (lasso, MCP, or SCAD) |
alpha |
The mix of L1 penalty (lower values introduce more L2 ridge penalty) |
ncvgamma |
The tuning parameter for ncvreg (for MCP or SCAD) |
lambda.min |
The minimum value to be used for lambda (as ratio of max, see ?ncvreg) |
k |
The maximum order of interactions to consider |
poly |
The maximum order of polynomials to consider |
gamma |
The degree of extremity of sparsity rankings (see details) |
cumulative_k |
Should penalties be increased cumulatively as order interaction increases? |
cumulative_poly |
Should penalties be increased cumulatively as order polynomial increases? |
pool |
Should interactions of order k and polynomials of order k+1 be pooled together for calculating the penalty? |
ia_formula |
formula to be passed to step_interact (for interactions, see details) |
pre_process |
Should the data be preprocessed (if FALSE, must provide model_matrix) |
model_matrix |
A data frame or matrix specifying the full model matrix (used if !pre_process) |
y |
A vector of responses (used if !pre_process) |
poly_prefix |
If model_matrix is specified, what is the prefix for polynomial terms? |
int_sep |
If model_matrix is specified, what is the separator for interaction terms? |
pre_proc_opts |
List of preprocessing steps (see details) |
filter |
The type of filter applied to main effects + interactions |
extra_opts |
A list of options for all preprocess steps (see details) |
... |
Additional arguments (passed to fitting function) |
Selecting gamma
: higher values of gamma will penalize "group" size more. By
default, this is set to 0.5, which yields equal contribution of prior
information across orders of interactions/polynomials (this is a good
default for most settings).
Additionally, setting cumulative_poly
or cumulative_k
to TRUE
increases
the penalty cumulatively based on the order of either polynomial or
interaction.
The options that can be passed to pre_proc_opts
are: - knnImpute (should
missing data be imputed?) - scale (should data be standardized)? - center
(should data be centered to the mean or another value?) - otherbin (should
factors with low prevalence be combined?) - none (should no preprocessing be
done? can also specify a null object)
The options that can be passed to extra_opts
are: - centers (named numeric
vector which denotes where each covariate should be centered) - center_fn
(alternatively, a function can be specified to calculate center such as min
or median
) - freq_cut, unique_cut (see ?step_nzv - these get used by the
filtering steps) - neighbors (the number of neighbors for knnImpute) -
one_hot (see ?step_dummy), this defaults to cell-means coding which can be
done in regularized regression (change at your own risk) - raw (should
polynomials not be orthogonal? defaults to true because variables are
centered and scaled already by this point by default)
ia_formula
will by default interact all variables with each other up
to order k. If specified, ia_formula will be passed as the terms
argument
to recipes::step_interact
, so the help documentation for that function
can be investigated for further assistance in specifying specific
interactions.
an object of class sparseR
containing the following:
fit |
the fit object returned by |
srprep |
a |
pen_factors |
the factor multiple on penalties for ranked sparsity |
results |
all coefficients and penalty factors at minimum CV lambda |
results_summary |
a tibble of summary results at minimum CV lambda |
results1se |
all coefficients and penalty factors at lambda_1se |
results1se_summary |
a tibble of summary results at lambda_1se |
data |
the (unprocessed) data |
family |
the family argument (for non-normal, eg. poisson) |
info |
a list containing meta-info about the procedure |
For fitting functionality, the ncvreg
package is used; see
Breheny, P. and Huang, J. (2011) Coordinate descent algorithms for nonconvex
penalized regression, with applications to biological feature selection. Ann.
Appl. Statist., 5: 232-253.