step_kmedoids {MachineShop} | R Documentation |
Creates a specification of a recipe step that will partition numeric variables according to k-medoids clustering and select the cluster medoids.
step_kmedoids( recipe, ..., k = 5, center = TRUE, scale = TRUE, method = c("pam", "clara"), metric = "euclidean", optimize = FALSE, num_samp = 50, samp_size = 40 + 2 * k, replace = TRUE, prefix = "KMedoids", role = "predictor", skip = FALSE, id = recipes::rand_id("kmedoids") ) tunable.step_kmedoids(x, ...)
recipe |
recipe object to which the step will be added. |
... |
one or more selector functions to choose which variables will be
used to compute the components. See |
k |
number of k-medoids clusterings of the variables. The value of
|
center, scale |
logicals indicating whether to mean center and median absolute deviation scale the original variables prior to cluster partitioning, or functions or names of functions for the centering and scaling; not applied to selected variables. |
method |
character string specifying one of the clustering methods
provided by the cluster package. The |
metric |
character string specifying the distance metric for calculating
dissimilarities between observations as |
optimize |
logical indicator or 0:5 integer level specifying
optimization for the |
num_samp |
number of sub-datasets to sample for the
|
samp_size |
number of cases to include in each sub-dataset. |
replace |
logical indicating whether to replace the original variables. |
prefix |
if the original variables are not replaced, the selected variables are added to the dataset with the character string prefix added to their names; otherwise, the original variable names are retained. |
role |
analysis role that added step variables should be assigned. By default, they are designated as model predictors. |
skip |
logical indicating whether to skip the step when the recipe is
baked. While all operations are baked when |
id |
unique character string to identify the step. |
x |
|
K-medoids clustering partitions variables into k groups such that the dissimilarity between the variables and their assigned cluster medoids is minimized. Cluster medoids are then returned as a set of k variables.
Function step_kmedoids
creates a new step whose class is of
the same name and inherits from step_sbf
, adds it to the
sequence of existing steps (if any) in the recipe, and returns the updated
recipe. For the tidy
method, a tibble with columns terms
(selectors or variables selected), cluster
assignments,
selected
(logical indicator of selected cluster medoids),
silhouette
(silhouette values), and name
of the selected
variable names.
Kaufman L and Rousseeuw PJ (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley: New York.
Reynolds A, Richards G, de la Iglesia B and Rayward-Smith V (1992). Clustering rules: a comparison of partitioning and hierarchical clustering algorithms. Journal of Mathematical Modelling and Algorithms 5, 475–504.
pam
, clara
,
recipe
, prep
,
bake
library(recipes) rec <- recipe(rating ~ ., data = attitude) kmedoids_rec <- rec %>% step_kmedoids(all_predictors(), k = 3) kmedoids_prep <- prep(kmedoids_rec, training = attitude) kmedoids_data <- bake(kmedoids_prep, attitude) pairs(kmedoids_data, lower.panel = NULL) tidy(kmedoids_rec, number = 1) tidy(kmedoids_prep, number = 1)