sketch_leverage {sketching} | R Documentation |
Provides a subsample of data using sketches
sketch_leverage(data, m, method = "leverage")
data |
(n times d)-dimensional matrix of data. The first column needs to be a vector of the dependent variable (Y) |
m |
subsample size that is less than n |
method |
method for sketching: "leverage" leverage score sampling using X (default); "root_leverage" square-root leverage score sampling using X. |
An S3 object has the following elements.
subsample |
(m times d)-dimensional matrix of data |
prob |
m-dimensional vector of probabilities |
Ma, P., Zhang, X., Xing, X., Ma, J. and Mahoney, M.. (2020). Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:1026-1035.
## Least squares: sketch and solve
# setup
n <- 1e+6 # full sample size
d <- 5 # dimension of covariates
m <- 1e+3 # sketch size
# generate psuedo-data
X <- matrix(stats::rnorm(n*d), nrow = n, ncol = d)
beta <- matrix(rep(1,d), nrow = d, ncol = 1)
eps <- matrix(stats::rnorm(n), nrow = n, ncol = 1)
Y <- X %*% beta + eps
intercept <- matrix(rep(1,n), nrow = n, ncol = 1)
# full sample including the intercept term
fullsample <- cbind(Y,intercept,X)
# generate a sketch using leverage score sampling
s_lev <- sketch_leverage(fullsample, m, "leverage")
# solve without the intercept with weighting
ls_lev <- lm(s_lev$subsample[,1] ~ s_lev$subsample[,2] - 1, weights = s_lev$prob)