nomprox {nomclust} | R Documentation |
The nomprox()
function performs hierarchical cluster analysis in situations when the proximity (dissimilarity) matrix was calculated externally. For instance, in a different R package, in an own-created function, or in other software.
It offers three linkage methods that can be used for categorical data. The obtained clusters can be evaluated by seven evaluation indices, see (Sulc et al., 2018).
nomprox(diss, data = NULL, method = "average", clu.high = 6, eval = TRUE)
diss |
A proximity matrix or a dist object calculated from the dataset defined in a parameter |
data |
A data.frame or a matrix with cases in rows and variables in colums. |
method |
A character string defining the clustering method. The following methods can be used: |
clu.high |
A numeric value expressing the maximal number of cluster for which the cluster memberships variables are produced. |
eval |
A logical operator; if TRUE, evaluation of clustering results is performed. |
The function returns a list with up to three components:
The mem
component contains cluster membership partitions for the selected numbers of clusters in the form of a list.
The eval
component contains seven evaluation criteria in as vectors in a list. Namely, Within-cluster mutability coefficient (WCM), Within-cluster entropy coefficient (WCE),
Pseudo F Indices based on the mutability (PSFM) and the entropy (PSFE), Bayessian (BIC) and Akaike (AIC) information criteria for categorical data and the BK index.
To see them all in once, the form of a data.frame is more appropriate.
The opt
component is present in the output together with the eval
component. It displays the optimal number of clusters for the evaluation criteria from the eval
component, except for WCM and WCE, where the optimal number of clusters is based on the elbow method.
Zdenek Sulc.
Contact: zdenek.sulc@vse.cz
Sulc Z., Cibulkova J., Prochazka J., Rezankova H. (2018). Internal Evaluation Criteria for Categorical Data in Hierarchical Clustering: Optimal Number of Clusters Determination, Metodoloski Zveski, 15(2), p. 1-20.
nomclust
, evalclust
, eval.plot
.
# sample data data(data20) # computation of a dissimilarity matrix using the iof similarity measure diss.matrix <- iof(data20) # creating an object with results of hierarchical clustering hca.object <- nomprox(diss = diss.matrix, data = data20, method = "complete", clu.high = 5, eval = TRUE)