bayeslm {bayeslm} | R Documentation |
This package implements an efficient sampler for Gaussian Bayesian linear regression. The package uses elliptical slice sampler instead of regular Gibbs sampler. The function has several built-in priors and user can also provide their own prior function (written as a R function).
## Default S3 method:
bayeslm(Y, X = FALSE, prior = "horseshoe", penalize = NULL,
block_vec = NULL, sigma = NULL, s2 = 1, kap2 = 1, N = 20000L, burnin = 0L,
thinning = 1L, vglobal = 1, sampling_vglobal = TRUE, verb = FALSE, icept = TRUE,
standardize = TRUE, singular = FALSE, scale_sigma_prior = TRUE, prior_mean = NULL,
prob_vec = NULL, cc = NULL, lambda = NULL, ...)
## S3 method for class 'formula'
bayeslm(formula, data = list(), Y = FALSE, X = FALSE,
prior = "horseshoe", penalize = NULL, block_vec = NULL, sigma = NULL,
s2 = 1, kap2 = 1, N = 20000L, burnin = 0L, thinning = 1L, vglobal = 1,
sampling_vglobal = TRUE, verb = FALSE, standardize = TRUE, singular = FALSE,
scale_sigma_prior = TRUE, prior_mean = NULL,
prob_vec = NULL, cc = NULL, lambda = NULL, ...)
formula |
|
data |
an optional data frame containing the variables in the model.
By default the variables are taken from the environment which
|
Y |
|
X |
|
prior |
Indicating shrinkage prior to use. |
block_vec |
A vector indicating number of regressors in each block. Sum of all entries should be the same as number of regressors. The default value is |
penalize |
A vector indicating shrink regressors or not. It's length should be the same as number of regressors. |
sigma |
Initial value of residual standard error. The default value is half of standard error of |
s2 , kap2 |
Parameter of prior over sigma, an inverse gamma prior with rate s2 and shape s2. |
N |
Number of posterior samples (after burn-in). |
burnin |
Number of burn-in samples. If burnin > 0, the function will draw N + burnin samples and return the last N samples only. |
thinning |
Number of thinnings. |
vglobal |
Initial value of global shrinkage parameter. Default value is 1 |
sampling_vglobal |
|
verb |
Bool, if |
icept |
Bool, if the inputs are matrix |
standardize |
Bool, if |
singular |
Bool, if |
scale_sigma_prior |
Bool, if |
prior_mean |
|
prob_vec |
|
cc |
Only works when |
lambda |
The shrinkage parameter for Laplace prior only. |
... |
optional parameters to be passed to the low level function |
For details of the approach, please see Hahn, He and Lopes (2017)
loops |
A |
sigma |
A |
vglobal |
A |
beta |
A |
fitted.values |
Fitted values of the regression model. Take posterior mean of coefficients with 20% burnin samples. |
residuals |
Residuals of the regression model, equals |
horseshoe
is essentially call function bayeslm
with prior = "horseshoe"
. Same for sharkfin
, ridge
, blasso
, nonlocal
.
Jingyu He jingyu.he@chicagobooth.edu
Hahn, P. Richard, Jingyu He, and Hedibert Lopes. Efficient sampling for Gaussian linear regression with arbitrary priors. (2017).
p = 20
n = 100
kappa = 1.25
beta_true = c(c(1,2,3),rnorm(p-3,0,0.01))
sig_true = kappa*sqrt(sum(beta_true^2))
x = matrix(rnorm(p*n),n,p)
y = x %*% beta_true + sig_true * rnorm(n)
x = as.matrix(x)
y = as.matrix(y)
data = data.frame(x = x, y = y)
block_vec = rep(1, p) # slice-within-Gibbs sampler, put every coefficient in its own block
fitOLS = lm(y~x-1)
# call the function using formulas
fita = bayeslm(y ~ x, prior = 'horseshoe',
block_vec = block_vec, N = 10000, burnin = 2000)
# summary the results
summary(fita)
summary(fita$beta)
# put the first two coefficients in one elliptical sampling block
block_vec2 = c(2, rep(1, p-2))
fitb = bayeslm(y ~ x, data = data, prior = 'horseshoe',
block_vec = block_vec2, N = 10000, burnin = 2000)
# comparing several different priors
fit1 = bayeslm(y,x,prior = 'horseshoe', icept = FALSE,
block_vec = block_vec, N = 10000, burnin=2000)
beta_est1 = colMeans(fit1$beta)
fit2 = bayeslm(y,x,prior = 'laplace', icept = FALSE,
block_vec = block_vec, N = 10000, burnin=2000)
beta_est2 = colMeans(fit2$beta)
fit3 = bayeslm(y,x,prior = 'ridge', icept = FALSE,
block_vec = block_vec, N = 10000, burnin=2000)
beta_est3 = colMeans(fit3$beta)
fit4 = bayeslm(y,x,prior = 'sharkfin', icept = FALSE,
block_vec = block_vec, N = 10000, burnin=2000)
beta_est4 = colMeans(fit4$beta)
fit5 = bayeslm(y,x,prior = 'nonlocal', icept = FALSE,
block_vec = block_vec, N = 10000, burnin=2000)
beta_est5 = colMeans(fit5$beta)
plot(NULL,xlim=range(beta_true),ylim=range(beta_true),
xlab = "beta true", ylab = "estimation")
points(beta_true,beta_est1,pch=20)
points(beta_true,fitOLS$coef,col='red')
points(beta_true,beta_est2,pch=20,col='cyan')
points(beta_true,beta_est3,pch=20,col='orange')
points(beta_true,beta_est4,pch=20,col='pink')
points(beta_true,beta_est5,pch=20,col='lightgreen')
legend("topleft", c("OLS", "horseshoe", "laplace", "ridge", "sharkfin",
"nonlocal"), col = c("red", "black", "cyan", "orange",
"pink", "lightgreen"), pch = rep(1, 6))
abline(0,1,col='red')
rmseOLS = sqrt(sum((fitOLS$coef-beta_true)^2))
rmse1 = sqrt(sum((beta_est1-beta_true)^2))
rmse2 = sqrt(sum((beta_est2-beta_true)^2))
rmse3 = sqrt(sum((beta_est3-beta_true)^2))
rmse4 = sqrt(sum((beta_est4-beta_true)^2))
rmse5 = sqrt(sum((beta_est5-beta_true)^2))
print(cbind(ols = rmseOLS, hs = rmse1,laplace = rmse2,
ridge = rmse3,sharkfin = rmse4,nonlocal = rmse5))