average_observed {effectplots}R Documentation

Average Observed

Description

Calculates average observed y values over the values of one or multiple X variables. This describes the statistical association between y and potential model features.

Usage

average_observed(
  X,
  y,
  w = NULL,
  x_name = "x",
  breaks = "Sturges",
  right = TRUE,
  discrete_m = 5L,
  outlier_iqr = 2,
  seed = NULL,
  ...
)

Arguments

X

A vector, matrix, or data.frame with variable(s) to be shown on the x axis.

y

A numeric vector of observed responses.

w

An optional numeric vector of weights.

x_name

If X is a vector: what is the name of the variable? By default "x".

breaks

An integer, vector, string or function specifying the bins of the numeric X variables as in graphics::hist(). The default is "Sturges". To allow varying values of breaks across variables, it can be a list of the same length as v, or a named list with breaks for certain variables.

right

Should bins be right-closed? The default is TRUE. Vectorized over v. Only relevant for numeric X.

discrete_m

Numeric X variables with up to this number of unique values should not be binned and treated as a factor (after calculating partial dependence) The default is 5. Vectorized over v.

outlier_iqr

Outliers of a numeric X are capped via the boxplot rule, i.e., outside outlier_iqr * IQR from the quartiles. The default is 2 is more conservative than the usual rule to account for right-skewed distributions. Set to 0 or Inf for no capping. Note that at most 10k observations are sampled to calculate quartiles. Vectorized over v.

seed

Optional random seed (an integer) used for capping X based on quantiles calculated from a subsample of 10k observations.

...

Currently unused.

Details

The function is a convenience wrapper around feature_effects().

Value

A list (of class "EffectData") with a data.frame of statistics per feature. Use single bracket subsetting to select part of the output.

See Also

feature_effects()

Examples

M <- average_observed(iris$Species, y = iris$Sepal.Length)
M
M |> plot()

# Or multiple potential features X
average_observed(iris[2:5], y = iris[, 1], breaks = 5) |>
  plot()

[Package effectplots version 0.1.0 Index]