ssp.quantreg {subsampling} | R Documentation |
Optimal Subsampling Methods for Quantile Regression Model
Description
Draw subsample from full dataset and fit quantile regression model. For a quick start, refer to the vignette.
Usage
ssp.quantreg(
formula,
data,
subset = NULL,
tau = 0.5,
n.plt,
n.ssp,
B = 5,
boot = TRUE,
criterion = "optL",
sampling.method = "withReplacement",
likelihood = c("weighted"),
control = list(...),
contrasts = NULL,
...
)
Arguments
formula |
A model formula object of class "formula" that describes the model to be fitted. |
data |
A data frame containing the variables in the model. Denote |
subset |
An optional vector specifying a subset of observations from |
tau |
The interested quantile. |
n.plt |
The pilot subsample size (first-step subsample size). This subsample is used to compute the pilot estimator and estimate the optimal subsampling probabilities. |
n.ssp |
The expected size of the optimal subsample (second-step subsample). For |
B |
The number of subsamples for the iterative sampling algorithm. Each subsample contains |
boot |
If TRUE then perform iterative sampling algorithm and estimate the covariance matrix. If FALSE then only one subsample with size |
criterion |
It determines how subsampling probabilities are computed.
Choices include
|
sampling.method |
The sampling method for drawing the optimal subsample.
Choices include |
likelihood |
The type of the maximum likelihood function used to
calculate the optimal subsampling estimator. Currently |
control |
The argument
|
contrasts |
An optional list. It specifies how categorical variables are represented in the design matrix. For example, |
... |
A list of parameters which will be passed to |
Details
Most of the arguments and returned variables have the same meaning with ssp.glm. Refer to vignette
A pilot estimator for the unknown parameter \beta
is required because
optL subsampling probabilities depend on \beta
. There is no "free lunch" when determining optimal subsampling probabilities. For quantile regression, this
is achieved by drawing a size n.plt
subsample with replacement from full
dataset, using uniform sampling probability.
If boot
=TRUE, the returned value subsample.size.expect
equals to B*n.ssp
, and the covariance matrix for coef
would be calculated.
If boot
=FALSE, the returned value subsample.size.expect
equals to B*n.ssp
, but the covariance matrix won't be estimated.
Value
ssp.quantreg
returns an object of class "ssp.quantreg" containing the following components (some are optional):
- model.call
The original function call.
- coef.plt
The pilot estimator. See Details for more information.
- coef
The estimator obtained from the optimal subsample.
- cov
The covariance matrix of
coef
- index.plt
Row indices of pilot subsample in the full dataset.
- index.ssp
Row indices of of optimal subsample in the full dataset.
- N
The number of observations in the full dataset.
- subsample.size.expect
The expected subsample size
- terms
The terms object for the fitted model.
References
Wang, H., & Ma, Y. (2021). Optimal subsampling for quantile regression in big data. Biometrika, 108(1), 99-112.
Examples
#quantile regression
set.seed(1)
N <- 1e4
B <- 5
tau <- 0.75
beta.true <- rep(1, 7)
d <- length(beta.true) - 1
corr <- 0.5
sigmax <- matrix(0, d, d)
for (i in 1:d) for (j in 1:d) sigmax[i, j] <- corr^(abs(i-j))
X <- MASS::mvrnorm(N, rep(0, d), sigmax)
err <- rnorm(N, 0, 1) - qnorm(tau)
Y <- beta.true[1] + X %*% beta.true[-1] +
err * rowMeans(abs(X))
data <- as.data.frame(cbind(Y, X))
colnames(data) <- c("Y", paste("V", 1:ncol(X), sep=""))
formula <- Y ~ .
n.plt <- 200
n.ssp <- 100
optL.results <- ssp.quantreg(formula,data,tau = tau,n.plt = n.plt,
n.ssp = n.ssp,B = B,boot = TRUE,criterion = 'optL',
sampling.method = 'withReplacement',likelihood = 'weighted')
summary(optL.results)
uni.results <- ssp.quantreg(formula,data,tau = tau,n.plt = n.plt,
n.ssp = n.ssp,B = B,boot = TRUE,criterion = 'uniform',
sampling.method = 'withReplacement', likelihood = 'weighted')
summary(uni.results)