bplsr {bplsr}R Documentation

Run the BPLS regression model

Description

Posterior inference of the Bayesian partial least squares regression model using a Gibbs sampler. There are three types of models available depending on the assumed prior structure on the model parameters (see details).

Usage

bplsr(
  X,
  Y,
  Xtest = NULL,
  Prior = NULL,
  Qs = NULL,
  N_MCMC = 20000,
  BURN = ceiling(0.3 * N_MCMC),
  Thin = 1,
  model.type = "standard",
  scale. = TRUE,
  center. = TRUE,
  PredInterval = 0.95
)

Arguments

X

Matrix of predictor variables.

Y

Vector or matrix of responses.

Xtest

Matrix of predictor variables to predict for.

Prior

List of hyperparameters specifying the parameter prior distributions. If left NULL, a generic set of priors will be generated.

Qs

Upper limit on the number of latent components. If NULL it is chosen automatically.

N_MCMC

Number of iterations to run the Markov chain Monte Carlo algorithm.

BURN

Number of iteration to be discarded as the burn-in.

Thin

Thinning procedure for the MArkov chain. Thin = 1 results in no thinning. Only use for long chains to reduce memory.

model.type

Type of BPLS model to use; one of standard, ss (spike-and-slab), or LASSO (see details).

scale.

Logical; if TRUE then the data variables will be scale to have unit variance.

center.

Logical; if TRUE then the data variables will be zero-centred.

PredInterval

Coverage of prediction intervals if Xtest is provided; 0.95 by default.

Details

The number of latent variables is inferred using the multiplicative gamma process prior (Bhattacharya and Dunson, 2011). Posterior samples from the fitted model are stored as a list. There are three types of parameter prior structures resulting in three different model types:

Empirical comparisons in Urbas et al. (2024) suggest that the LASSO variant is the best at point predictions and prediction interval coverage when applied to spectral data.

Value

A list of:

chain

A Markov chain of samples from the parameter posterior.

X

Original set of predictor variables.

Y

Original set of response variables.

Xtest

Original set of predictor variables to predict from; if Xtest is provided.

Ytest

Point predictions for new responses; if Xtest is provided.

Ytest_PI

Prediction intervals for new responses (by default 0.95 coverage); if Xtest is provided.

Ytest_dist

Posterior predictive distributions for new responses; if Xtest is provided.

diag

Additional diagnostics for assessing chain convergence.

References

Bhattacharya, A. and Dunson, D. B. (2011) Sparse Bayesian infinite factor models, Biometrika, 98(2): 291–306

Chun, H. and Keles, S. (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(1):3–25.

Trygg, J. and Wold, S. (2003). O2-PLS, a two-block (X–Y) latent variable regression (LVR) method with an integral OSC filter. Journal of Chemometrics, 17(1):53–64.

Urbas, S., Lovera, P., Daly, R., O'Riordan, A., Berry, D., and Gormley, I. C. (2024). "Predicting milk traits from spectral data using Bayesian probabilistic partial least squares regression." The Annals of Applied Statistics, 18(4): 3486-3506. <doi:10.1214/24-AOAS1947>

Wold, H. (1973). Nonlinear iterative partial least squares (NIPALS) modelling: some current developments. In Multivariate analysis–III, pages 383–407. Elsevier.

Examples


# data(milk_MIR)
X = milk_MIR$xMIR
Y = milk_MIR$yTraits[, c('Casein_content','Fat_content')]

set.seed(1)
# fit model to 25% of data and predict on remaining 75%
idx = sample(seq(nrow(X)),floor(nrow(X)*0.25),replace = FALSE)

Xtrain = X[idx,];Ytrain = Y[idx,]
Xtest = X[-idx,];Ytest = Y[-idx,]

# fit the model (default MCMC settings can take longer)
bplsr_Fit = bplsr(Xtrain,Ytrain)

# generate predictions
bplsr_pred = bplsr.predict(model = bplsr_Fit, newdata = Xtest)

# point predictions
head(bplsr_pred$Ytest)

# lower and upper limits of prediction interval
head(bplsr_pred$Ytest_PI)

# plot of predictive posterior distribution for single test sample
hist(bplsr_pred$Ytest_dist[1,'Casein_content',], freq = FALSE,
     main = 'Posterior predictive density', xlab = 'Casein_content')

[Package bplsr version 1.0.1 Index]