IDEA {FACT} | R Documentation |
IDEA
with a soft label predictor (sIDEA)
tacks changes the soft label of being assigned to each existing cluster
throughout a (multidimensional) feature space
IDEA
with a hard label predictor (hIDEA)
tacks changes the soft label of being assigned to each existing cluster
throughout a (multidimensional) feature space
IDEA
for soft labeling algorithms (sIDEA) indicates the soft label that an
observation \textbf{x}
with replaced values \tilde{\textbf{x}}_S
is assigned to
the k-th cluster. IDEA
for hard labeling algorithms (hIDEA) indicates
the cluster assignment of an observation \textbf{x}
with replaced values
\tilde{\textbf{x}}_S
.
The global IDEA
is denoted by the corresponding data set X:
\text{sIDEA}_X(\tilde{\textbf{x}}_S) = \left(\frac{1}{n} \sum_{i = 1}^n
\text{sIDEA}^{(1)}_{\textbf{x}^{(i)}}(\tilde{\textbf{x}}_S), \dots, \frac{1}{n}
\sum_{i = 1}^n \text{sIDEA}^{(k)}_{\textbf{x}^{(i)}}(\tilde{\textbf{x}}_S) \right)
where the c-th vector element is the average c-th vector element of local sIDEA functions. The global hIDEA corresponds to:
\text{hIDEA}_X(\tilde{\textbf{x}}_S) = \left(\frac{1}{n}\sum_{i = 1}^n
\mathbb{1}_{1}(\text{hIDEA}_{\textbf{x}^{(i)}}(\tilde{\textbf{x}}_S)), \dots,
\frac{1}{n}\sum_{i = 1}^n \mathbb{1}_{k}(\text{hIDEA}_{\textbf{x}^{(i)}}(\tilde{\textbf{x}}_S))\right)
where the c-th vector element is the fraction of hard label reassignments to the c-th cluster.
predictor
ClustPredictor
The object (created with ClustPredictor$new()
) holding
the cluster algorithm and the data.
feature
(character or list
)
Features/ feature sets to calculate the effect curves.
method
character(1)
The IDEA
method to be used.
mg
DataGenerator
A MarginalGenerator
object to sample and generate
the pseudo instances.
results
data.table
The IDEA
results.
noise.out
any
Indicator for the noise variable.
type
function
Detect the type in the predictor
new()
Create an IDEA object.
IDEA$new(predictor, feature, method = "g+l", grid.size = 20L, noise.out = NULL)
predictor
ClustPredictor
The object (created with ClustPredictor$new()
) holding
the cluster algorithm and the data.
feature
(character or list
)
For which features do you want importance scores calculated. The default
value of NULL
implies all features. Use a named list of character vectors
to define groups of features for which joint importance will be calculated.
method
character(1)
The IDEA
method to be used. Possible choices for the method are:
"g+l"
(default): store global and local IDEA
results
"local"
: store only local IDEA
results
"global"
: store only global IDEA
results
"init_local"
: store only local IDEA
results and
additional reference for the observations initial
assigned cluster.
"init_g+l"
store global and local IDEA
results and
additional reference for the observations initial
assigned cluster.
grid.size
(numeric(1) or NULL)
size of the grid to replace values. If grid size is
given, an equidistant grid is create. If NULL
, values
are calculated at all present combinations of feature values.
noise.out
any
Indicator for the noise variable. If not NULL, noise will
be excluded from the effect estimation.
(data.frame)
Values for the effect curves:
One row per grid per instance for each local idea
estimation. If method
includes global estimation, one
additional row per grid point.
plot()
Plot an IDEA object.
IDEA$plot(c = NULL)
c
indicator for the cluster to plot. If NULL
,
all clusters are plotted.
(ggplot)
A ggplot object that depends on the method
chosen.
plot_globals()
Plot the global sIDEA curves of all clusters.
IDEA$plot_globals(mass = NULL)
mass
between 0 and 1. The percentage of local IDEA
curves to plot a certainty interval.
(ggplot)
A ggplot object.
clone()
The objects of this class are cloneable with this method.
IDEA$clone(deep = FALSE)
deep
Whether to make a deep clone.
iml::FeatureEffects, iml::FeatureEffects
# load data and packages
require(factoextra)
require(FuzzyDBScan)
multishapes = as.data.frame(multishapes[, 1:2])
# Set up an train FuzzyDBScan
eps = c(0, 0.2)
pts = c(3, 15)
res = FuzzyDBScan$new(multishapes, eps, pts)
res$plot("x", "y")
# create soft label predictor
predict_prob = function(model, newdata) model$predict(new_data = newdata)
predictor = ClustPredictor$new(res, as.data.frame(multishapes), y = res$results,
predict.function = predict_prob, type = "prob")
# Calculate `IDEA` global and local for feature "x"
idea_x = IDEA$new(predictor = predictor, feature = "x", grid.size = 5)
idea_x$plot_globals(0.5) # plot global effect of all clusters with 50 percent of local mass.