mselect_adproclus_low_dim {adproclus}R Documentation

Model selection helper for low dimensional ADPROCLUS

Description

Performs low dimensional ADPROCLUS for the number of clusters from min_nclusters to max_nclusters and the number of components from min_ncomponents to max_ncomponents. This replaces the need to manually estimate multiple models to select the best number of clusters and components and returns the results in a format compatible with plot_scree_adpc to obtain a scree plot / multiple scree plots. Output is also compatible with select_by_CHull to automatically select a suitable number of components for each number of clusters. The compatibility with both functions is only given if return_models = FALSE.

Usage

mselect_adproclus_low_dim(
  data,
  min_nclusters,
  max_nclusters,
  min_ncomponents,
  max_ncomponents,
  return_models = FALSE,
  unexplvar = TRUE,
  start_allocation = NULL,
  nrandomstart = 1,
  nsemirandomstart = 1,
  save_all_starts = FALSE,
  seed = NULL
)

Arguments

data

Object-by-variable data matrix of class matrix or data.frame.

min_nclusters

Minimum number of clusters to estimate.

max_nclusters

Maximum number of clusters to estimate.

min_ncomponents

Minimum number of components to estimate. Must be smaller or equal than min_nclusters.

max_ncomponents

Maximum number of components to estimate. Must be smaller or equal than max_nclusters.

return_models

Boolean. If FALSE a matrix of model fit scores is returned, which is compatible with the plot_scree_adpc function. If TRUE the list of actually estimated models is returned.

unexplvar

Boolean. If TRUE the model fit is specified in terms of unexplained variance. Otherwise it will be specified in terms of Sum of Squared Errors (SSE). This propagates through to the scree plots.

start_allocation

Optional starting cluster membership matrix to be passed to the low dimensional ADPROCLUS procedure. See get_rational for more information.

nrandomstart

Number of random starts computed for each model.

nsemirandomstart

Number of semi-random starts computed for each model.

save_all_starts

Logical. If TRUE and return_models = TRUE, the results of all algorithm starts are returned. By default, only the best solution is retained.

seed

Integer. Seed for the random number generator. Default: NULL, meaning no reproducibility.

Value

Number of clusters by number of components matrix where the values are SSE or unexplained variance scores for all estimated models. Row names are the value of the cluster parameter for the relevant model. Column names contain the value of the components parameter. Depends on the choice of return_models. If TRUE a list of estimated models is returned.

See Also

adproclus_low_dim

for the actual low dimensional ADPROCLUS procedure

plot_scree_adpc

for plotting the model fits

select_by_CHull

for automatic model selection via CHull method

Examples

# Loading a test dataset into the global environment
x <- stackloss

# Estimating models with cluster parameter values ranging from 1 to 4
# and component parameter values also ranging from 1 to 4
model_fits <- mselect_adproclus_low_dim(data = x, 1, 4, 1, 4, seed = 1)

# Plot the results as a scree plot to select the appropriate number of clusters
plot_scree_adpc(model_fits)


[Package adproclus version 2.0.0 Index]