mselect_adproclus {adproclus}R Documentation

Model selection helper for ADPROCLUS

Description

Performs ADPROCLUS for the number of clusters from min_nclusters to max_nclusters. This replaces the need to manually estimate multiple models to select the best number of clusters and returns the results in a format compatible with plot_scree_adpc to obtain a scree plot. Output is also compatible with select_by_CHull to automatically select a suitable number of clusters. The compatibility with both functions is only given if return_models = FALSE.

Usage

mselect_adproclus(
  data,
  min_nclusters,
  max_nclusters,
  return_models = FALSE,
  unexplvar = TRUE,
  start_allocation = NULL,
  nrandomstart = 1,
  nsemirandomstart = 1,
  algorithm = "ALS2",
  save_all_starts = FALSE,
  seed = NULL
)

Arguments

data

Object-by-variable data matrix of class matrix or data.frame.

min_nclusters

Minimum number of clusters to estimate.

max_nclusters

Maximum number of clusters to estimate.

return_models

Boolean. If FALSE a vector of model fit scores is returned, which is compatible with the plot_scree_adpc function. If TRUE the list of actually estimated models is returned.

unexplvar

Boolean. If TRUE the model fit is specified in terms of unexplained variance. Otherwise it will be specified in terms of Sum of Squared Errors (SSE). This propagates through to the scree plots.

start_allocation

Optional starting cluster membership matrix to be passed to the ADPROCLUS procedure. See get_rational for more information.

nrandomstart

Number of random starts computed for each model.

nsemirandomstart

Number of semi-random starts computed for each model.

algorithm

Character string "ALS1" or "ALS2" (default), denoting the type of alternating least squares algorithm. Can be abbreviated with "1" or "2".

save_all_starts

Logical. If TRUE and return_models = TRUE, the results of all algorithm starts are returned. By default, only the best solution is retained.

seed

Integer. Seed for the random number generator. Default: NULL, meaning no reproducibility.

Value

Matrix with one column of SSE or unexplained variance scores for all estimated models. Row names are the value of the cluster parameter for the relevant model. Depends on the choice of return_models. If TRUE a list of estimated models is returned.

See Also

adproclus

for the actual ADPROCLUS procedure

plot_scree_adpc

for plotting the model fits

select_by_CHull

for automatic model selection via CHull method

Examples

# Loading a test dataset into the global environment
x <- stackloss

# Estimating models with cluster parameter values ranging from 1 to 4
model_fits <- mselect_adproclus(data = x, min_nclusters = 1, max_nclusters = 4, seed = 10)

# Plot the results as a scree plot to select the appropriate number of clusters
plot_scree_adpc(model_fits)


[Package adproclus version 2.0.0 Index]