lm_betaselect {betaselectr}R Documentation

Betas-Select in a Regression Model

Description

Can fit a linear regression models with selected variables standardized; handle product terms correctly and skip categorical predictors in standardization.

Usage

lm_betaselect(
  ...,
  to_standardize = NULL,
  not_to_standardize = NULL,
  skip_response = FALSE,
  do_boot = TRUE,
  bootstrap = 100L,
  iseed = NULL,
  parallel = FALSE,
  ncpus = parallel::detectCores(logical = FALSE) - 1,
  progress = TRUE,
  load_balancing = FALSE,
  model_call = c("lm", "glm")
)

glm_betaselect(
  ...,
  to_standardize = NULL,
  not_to_standardize = NULL,
  skip_response = FALSE,
  do_boot = TRUE,
  bootstrap = 100L,
  iseed = NULL,
  parallel = FALSE,
  ncpus = parallel::detectCores(logical = FALSE) - 1,
  progress = TRUE,
  load_balancing = FALSE
)

## S3 method for class 'lm_betaselect'
print(
  x,
  digits = max(3L, getOption("digits") - 3L),
  type = c("beta", "standardized", "raw", "unstandardized"),
  ...
)

## S3 method for class 'glm_betaselect'
print(
  x,
  digits = max(3L, getOption("digits") - 3L),
  type = c("beta", "standardized", "raw", "unstandardized"),
  ...
)

raw_output(x)

Arguments

...

For lm_betaselect(). these arguments will be passed directly to lm(). For glm_betaselect(), these arguments will be passed to glm(). For the print-method of lm_betaselect or glm_betaselect objects, this will be passed to other methods.

to_standardize

A string vector, which should be the names of the variables to be standardized. Default is NULL, indicating all variables are to be standardized.

not_to_standardize

A string vector, which should be the names of the variables that should not be standardized. This argument is useful when most variables, except for a few, are to be standardized. This argument cannot be ued with to_standardize at the same time. Default is NULL, and only to_standardize is used.

skip_response

Logical. If TRUE, will not standardize the response (outcome) variable even if it appears in to_standardize or to_standardize is not specified. Used for models such as logistic regression models in which there are some restrictions on the response variables (e.g., only 0 or 1 for logistic regression).

do_boot

Whether bootstrapping will be conducted. Default is TRUE.

bootstrap

If do_boot is TRUE, this argument is the number of bootstrap samples to draw. Default is 100. Should be set to 5000 or even 10000 for stable results.

iseed

If do_boot is TRUE and this argument is not NULL, it will be used by set.seed() to set the seed for the random number generator. Default is NULL.

parallel

If do_boot is TRUE and this argument is TRUE, parallel processing will be used to do bootstrapping. Default is FALSE because bootstrapping for models fitted by stats::lm() or stats::glm() is rarely slow. Actually, if both parallel and progress are set to TRUE, the speed may even be slower than serial processing.

ncpus

If do_boot is TRUE and parallel is also TRUE, this argument is the number of processes to be used in parallel processing. Default is parallel::detectCores(logical = FALSE) - 1

progress

Logical. If TRUE, progress bars will be displayed for long process. Default is TRUE.

load_balancing

Logical. If parallel is TRUE, this determines whether load balancing will be used. Default is FALSE because the gain in speed is usually minor.

model_call

The model function to be called. If "lm", the default, the model will be fitted by stats::lm(). If "glm", the model will be fitted by stats::glm(). Users should call the corresponding function directly rather than setting this argument manually.

x

An lm_betaselect or glm_betaselect object.

digits

The number of significant digits to be printed for the coefficients.

type

The coefficients to be printed. For "beta" or "standardized", the coefficients after selected variables standardized will be printed. For "raw" or "unstandardized", the coefficients before standardization was done will be printed.

Details

The functions lm_betaselect() and glm_betaselect() let users select which variables to be standardized when computing the standardized solution. They have the following features:

Problems With Common Approaches

In some regression programs, users have limited control on which variables to standardize when requesting the so-called "betas". The solution may be uninterpretable or misleading in these conditions:

How The Function Work

They standardize the original variables before they are used in the model. Therefore, strictly speaking, they do not standardize the predictors in model, but standardize the input variable (Gelman et al., 2021).

The requested model is then fitted to the dataset with selected variables standardized. For the ease of follow-up analysis, both the results with selected variables standardized and the results without standardization are stored. If required, the results without standardization can be retrieved by raw_output().

Methods

The output of lm_betaselect() is an lm_betaselect-class object, and the output of glm_betaselect() is a glm_betaselect-class object. They have the following methods:

Most other methods for the output of stats::lm() and stats::glm() should also work on an lm_betaselect-class object or a glm_betaselect-class object, respectively. Some of them will give the same results regardless of the variables standardized. Examples are rstandard() and cooks.distance(). For some others, they should be used with cautions if they make use of the variance-covariance matrix of the estimates.

To use the methods for lm objects or glm objects on the results without standardization, simply use raw_output(). For example, to get the fitted values without standardization, call fitted(raw_output(x)), where x is the output of lm_betaselect() or glm_betaselect().

The function raw_output() simply extracts the regression output by stats::lm() or stats::glm() on the variables without standardization.

Value

The function lm_betaselect() returns an object of the class lm_betaselect, The function glm_betaselect() returns an object of the class glm_betaselect. They are similar in structure to the output of stats::lm() and stats::glm(), with additional information stored.

The function raw_output() returns an object of the class lm or glm, which are the results of fitting the model to the data by stats::lm() or stats::glm() without standardization.

Author(s)

Shu Fai Cheung https://orcid.org/0000-0002-9871-9448

References

Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022) Improving an old way to measure moderation effect in standardized units. Health Psychology, 41(7), 502-505. doi:10.1037/hea0001188

Craig, C. C. (1936). On the frequency function of xy. The Annals of Mathematical Statistics, 7(1), 1–15. doi:10.1214/aoms/1177732541

Gelman, A., Hill, J., & Vehtari, A. (2021). Regression and other stories. Cambridge University Press. doi:10.1017/9781139161879

Jones, J. A., & Waller, N. G. (2013). Computing confidence intervals for standardized regression coefficients. Psychological Methods, 18(4), 435–453. doi:10.1037/a0033269

See Also

print.lm_betaselect() and print.glm_betaselect() for the print-methods.

Examples


data(data_test_mod_cat)

# Standardize only iv

lm_beta_x <- lm_betaselect(dv ~ iv*mod + cov1 + cat1,
                           data = data_test_mod_cat,
                           to_standardize = "iv")
lm_beta_x
summary(lm_beta_x)

# Manually standardize iv and call lm()

data_test_mod_cat$iv_z <- scale(data_test_mod_cat[, "iv"])[, 1]

lm_beta_x_manual <- lm(dv ~ iv_z*mod + cov1 + cat1,
                       data = data_test_mod_cat)

coef(lm_beta_x)
coef(lm_beta_x_manual)

# Standardize all numeric variables

lm_beta_all <- lm_betaselect(dv ~ iv*mod + cov1 + cat1,
                             data = data_test_mod_cat)
# Note that cat1 is not standardized
summary(lm_beta_all)


data(data_test_mod_cat)

data_test_mod_cat$p <- scale(data_test_mod_cat$dv)[, 1]
data_test_mod_cat$p <- ifelse(data_test_mod_cat$p > 0,
                              yes = 1,
                              no = 0)
# Standardize only iv
logistic_beta_x <- glm_betaselect(p ~ iv*mod + cov1 + cat1,
                                  family = binomial,
                                  data = data_test_mod_cat,
                                  to_standardize = "iv")
summary(logistic_beta_x)

logistic_beta_x
summary(logistic_beta_x)

# Manually standardize iv and call glm()

data_test_mod_cat$iv_z <- scale(data_test_mod_cat[, "iv"])[, 1]

logistic_beta_x_manual <- glm(p ~ iv_z*mod + cov1 + cat1,
                              family = binomial,
                              data = data_test_mod_cat)

coef(logistic_beta_x)
coef(logistic_beta_x_manual)

# Standardize all numeric predictors

logistic_beta_allx <- glm_betaselect(p ~ iv*mod + cov1 + cat1,
                                     family = binomial,
                                     data = data_test_mod_cat,
                                     to_standardize = c("iv", "mod", "cov1"))
# Note that cat1 is not standardized
summary(logistic_beta_allx)


summary(raw_output(lm_beta_x))


[Package betaselectr version 0.1.0 Index]