dendesc {mdendro} | R Documentation |
Descriptive measures for analyzing objects of class
"dendrogram"
.
ntb(dendro) ultrametric(dendro) mae(prox, ultr) sdr(prox, ultr)
dendro |
Object of class |
prox |
Object of class |
ultr |
Object of class |
This package allows the calculation of several descriptive measures for dendrograms, such as normalized tree balance, cophenetic correlation coefficient, normalized mean absolute error, and space distortion ratio.
For each node in a dendrogram, its entropy is calculated using the
concept of Shannon's entropy, which gives a maximum entropy of 1 to nodes
merging subdendrograms with the same number of leaves. The average entropy
for all nodes in a dendrogram is called its tree balance. Normalized
tree balance is computed by the ntb()
function as the ratio
between the tree balance of a dendrogram and the minimum tree balance of any
dendrogram with the same number of elements. Perfectly balanced dendrograms
have a normalized tree balance equal to 1, while binary dendrograms formed
chaining one new element at a time have a normalized tree balance equal to 0.
To calculate the cophenetic correlation coefficient, the
cor()
function in the stats package needs that the
matrix of ultrametric distances (also known as cophenetic distances) and the
matrix of proximity data used to build the corresponding dendrogram, they
both have their rows and columns sorted in the same order. When the
cophenetic()
function is used with objects of class
"hclust"
, it returns ultrametric matrices sorted in
appropriate order. However, when the cophenetic()
function is used with objects of class "dendrogram"
, it
returns ultrametric matrices sorted in the order of dendrogram leaves. The
ultrametric()
function in this package returns ultrametric
matrices in appropriate order to calculate the cophenetic correlation
coefficient using the cor()
function.
The space distortion ratio of a dendrogram is computed by the
sdr()
function as the difference between the maximum and
minimum ultrametric distances, divided by the difference between the
maximum and minimum original distances used to build the dendrogram. Space
dilation occurs when the space distortion ratio is greater than 1.
ntb
: Returns a number between 0 and 1 representing the normalized tree balance of
the input dendrogram.
ultrametric
: Returns an object of class "dist"
containing the
ultrametric distance matrix sorted in the same order as the proximity matrix
used to build the corresponding dendrogram.
mae
: Returns the normalized mean absolute error.
sdr
: Returns the space distortion ratio.
linkage()
in this package, hclust()
in the
stats package, and agnes()
in the cluster
package for building hierarchical trees.
## distances between 21 cities in Europe data(eurodist) ## comparison of dendrograms in terms of the following descriptive mesures: ## - normalized tree balance ## - cophenetic correlation coefficient ## - normalized mean absolute error ## - space distortion ratio ## single linkage (call to the mdendro package) dendro1 <- linkage(eurodist, method="single") ntb(dendro1) # 0.2500664 ultr1 <- ultrametric(dendro1) cor(eurodist, ultr1) # 0.7842797 mae(eurodist, ultr1) # 0.6352011 sdr(eurodist, ultr1) # 0.150663 ## complete linkage (call to the stats package) dendro2 <- as.dendrogram(hclust(eurodist, method="complete")) ntb(dendro2) # 0.8112646 ultr2 <- ultrametric(dendro2) cor(eurodist, ultr2) # 0.735041 mae(eurodist, ultr2) # 0.8469728 sdr(eurodist, ultr2) # 1 ## unweighted arithmetic linkage (UPGMA) dendro3 <- linkage(eurodist, method="arithmetic", weighted=FALSE) ntb(dendro3) # 0.802202 ultr3 <- ultrametric(dendro3) cor(eurodist, ultr3) # 0.7279432 mae(eurodist, ultr3) # 0.294578 sdr(eurodist, ultr3) # 0.5066903 ## unweighted geometric linkage dendro4 <- linkage(eurodist, method="geometric", weighted=FALSE) ntb(dendro4) # 0.7531278 ultr4 <- ultrametric(dendro4) cor(eurodist, ultr4) # 0.7419569 mae(eurodist, ultr4) # 0.2891692 sdr(eurodist, ultr4) # 0.4548112