partial_dependence {effectplots} | R Documentation |
Calculates PD for one or multiple X
variables.
PD was introduced by Friedman (2001) to study the (main) effects
of a ML model. PD of a model f and variable X
at a certain value g
is derived by replacing the X
values in a reference data
by g,
and then calculating the average prediction of f over this modified data.
This is done for different g to see how the average prediction of f changes in X
,
keeping all other feature values constant (Ceteris Paribus).
This function is a convenience wrapper around feature_effects()
, which calls
the barebone implementation .pd()
to calculate PD.
As grid points, it uses the arithmetic mean of X
per bin (specified by breaks
),
and eventually weighted by w
.
partial_dependence(object, ...)
## Default S3 method:
partial_dependence(
object,
v,
data,
pred_fun = stats::predict,
trafo = NULL,
which_pred = NULL,
w = NULL,
breaks = "Sturges",
right = TRUE,
discrete_m = 5L,
outlier_iqr = 2,
pd_n = 500L,
seed = NULL,
...
)
## S3 method for class 'ranger'
partial_dependence(
object,
v,
data,
pred_fun = NULL,
trafo = NULL,
which_pred = NULL,
w = NULL,
breaks = "Sturges",
right = TRUE,
discrete_m = 5L,
outlier_iqr = 2,
pd_n = 500L,
seed = NULL,
...
)
## S3 method for class 'explainer'
partial_dependence(
object,
v = colnames(data),
data = object$data,
pred_fun = object$predict_function,
trafo = NULL,
which_pred = NULL,
w = object$weights,
breaks = "Sturges",
right = TRUE,
discrete_m = 5L,
outlier_iqr = 2,
pd_n = 500L,
seed = NULL,
...
)
object |
Fitted model. |
... |
Further arguments passed to |
v |
Vector of variable names to calculate statistics. |
data |
Matrix or data.frame. |
pred_fun |
Prediction function, by default |
trafo |
How should predictions be transformed?
A function or |
which_pred |
If the predictions are multivariate: which column to pick
(integer or column name). By default |
w |
Optional vector with case weights. Can also be a column name in |
breaks |
An integer, vector, string or function specifying the bins
of the numeric X variables as in |
right |
Should bins be right-closed? The default is |
discrete_m |
Numeric X variables with up to this number of unique values
should not be binned and treated as a factor (after calculating partial dependence)
The default is 5. Vectorized over |
outlier_iqr |
Outliers of a numeric X are capped via the boxplot rule, i.e.,
outside |
pd_n |
Size of the data used for calculating partial dependence.
The default is 500. For larger |
seed |
Optional random seed (an integer) used for:
|
A list (of class "EffectData") with a data.frame of statistics per feature. Use single bracket subsetting to select part of the output.
partial_dependence(default)
: Default method.
partial_dependence(ranger)
: Default method.
partial_dependence(explainer)
: Default method.
Friedman, Jerome H. 2001, Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics 29 (5): 1189-1232. doi:10.1214/aos/1013203451.
feature_effects()
, .pd()
, ale()
.
fit <- lm(Sepal.Length ~ ., data = iris)
M <- partial_dependence(fit, v = "Species", data = iris)
M |> plot()
M2 <- partial_dependence(fit, v = colnames(iris)[-1], data = iris)
plot(M2, share_y = "all")