calculate_variable_splits {ceterisParibus} | R Documentation |
This function calculate candidate splits for each selected variable. For numerical variables splits are calculated as percentiles (in general uniform quantiles of the length grid_points). For all other variables splits are calculated as unique values.
calculate_variable_splits(data, variables = colnames(data), grid_points = 101)
data |
validation dataset. Is used to determine distribution of observations. |
variables |
names of variables for which splits shall be calculated |
grid_points |
number of points used for response path |
Note that calculate_variable_splits
function is S3 generic.
If you want to work on non standard data sources (like H2O ddf, external databases)
you should overload it.
A named list with splits for selected variables
library("DALEX")
## Not run:
library("randomForest")
set.seed(59)
apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor +
no.rooms + district, data = apartments)
vars <- c("construction.year", "surface", "floor", "no.rooms", "district")
calculate_variable_splits(apartments, vars)
## End(Not run)