bias {effectplots}R Documentation

Bias / Average Residuals

Description

Calculates average residuals (= bias) over the values of one or multiple X variables.

Usage

bias(
  X,
  resid,
  w = NULL,
  x_name = "x",
  breaks = "Sturges",
  right = TRUE,
  discrete_m = 5L,
  outlier_iqr = 2,
  seed = NULL,
  ...
)

Arguments

X

A vector, matrix, or data.frame with variable(s) to be shown on the x axis.

resid

A numeric vector of residuals, i.e., y - pred.

w

An optional numeric vector of weights.

x_name

If X is a vector: what is the name of the variable? By default "x".

breaks

An integer, vector, string or function specifying the bins of the numeric X variables as in graphics::hist(). The default is "Sturges". To allow varying values of breaks across variables, it can be a list of the same length as v, or a named list with breaks for certain variables.

right

Should bins be right-closed? The default is TRUE. Vectorized over v. Only relevant for numeric X.

discrete_m

Numeric X variables with up to this number of unique values should not be binned and treated as a factor (after calculating partial dependence) The default is 5. Vectorized over v.

outlier_iqr

Outliers of a numeric X are capped via the boxplot rule, i.e., outside outlier_iqr * IQR from the quartiles. The default is 2 is more conservative than the usual rule to account for right-skewed distributions. Set to 0 or Inf for no capping. Note that at most 10k observations are sampled to calculate quartiles. Vectorized over v.

seed

Optional random seed (an integer) used for capping X based on quantiles calculated from a subsample of 10k observations.

...

Currently unused.

Details

The function is a convenience wrapper around feature_effects().

Value

A list (of class "EffectData") with a data.frame of statistics per feature. Use single bracket subsetting to select part of the output.

See Also

feature_effects()

Examples

fit <- lm(Sepal.Length ~ ., data = iris)
M <- bias(iris[2:5], resid = fit$residuals, breaks = 5)
M |> update(sort_by = "resid_mean") |> plot(share_y = "all")

[Package effectplots version 0.1.0 Index]