cluster_means {adproclus}R Documentation

Cluster Means based on Original Variables

Description

Obtain a cluster-by-variable dataframe where the values are the cluster means for the given variables. Takes as input a (low dimensional) ADPROCLUS model of class adpc and a dataset. This dataset must have the same number of rows as the cluster membership matrix $A$ of the model. The variables can be different from the ones the model was trained on. The function uses the cluster membership matrix of the model to computer per cluster the mean of the variables in the dataset. In the output matrix of cluster means, the last row Cl0 corresponds to the baseline cluster consisting of all the observations that were not assigned to a cluster, if this cluster is not empty. This function effectively computes column means of the dataset separately for each cluster.

Usage

cluster_means(data, model, digits = 3)

Arguments

data

Object-by-variable matrix. Can contain other variables than the ADPROCLUS model. IMPORTANT: The number of rows must be equal to the number of observations in the ADPROCLUS model.

model

ADPROCLUS solution (class: adpc). Low dimensional model possible.

digits

Integer. The number of decimal places that all decimal numbers will be rounded to.

Details

It is worth noting that the output of this function is different from the last output matrix in the summary() method applied to an ADPROCLUS model. The former computes the means over the original variable values while the latter computes them over the approximated model variable values.

Value

Cluster-by-variable dataframe where the values are the cluster means for the given variable.

Examples

# Obtain data, compute model, report cluster means
x <- CGdata
model <- adproclus(x, 3)
cluster_means(data = x, model = model)

[Package adproclus version 2.0.0 Index]