e_mcv {GFDmcv} | R Documentation |
Calculates the estimators with respective (1-\alpha)
-confidence intervals for the four different variants of the multivariate coefficients (MCV) and their reciprocals
by Reyment (1960), Van Valen (1974), Voinov and Nikulin (1996) and Albert and Zhang (2010).
e_mcv(x, conf_level = 0.95)
x |
a matrix of data of size |
conf_level |
a confidence level. By default, it is equal to 0.95. |
The function e_mcv()
calculates four different variants of multivariate coefficient of variation for d
-dimensional data. These variant were introduced by
by Reyment (1960, RR), Van Valen (1974, VV), Voinov and Nikulin (1996, VN) and Albert and Zhang (2010, AZ):
{\widehat C}^{RR}=\sqrt{\frac{(\det\mathbf{\widehat\Sigma})^{1/d}}{\boldsymbol{\widehat\mu}^{\top}\boldsymbol{\widehat\mu}}},\
{\widehat C}^{VV}=\sqrt{\frac{\mathrm{tr}\mathbf{\widehat\Sigma}}{\boldsymbol{\widehat\mu}^{\top}\boldsymbol{\widehat\mu}}},\
{\widehat C}^{VN}=\sqrt{\frac{1}{\boldsymbol{\widehat\mu}^{\top}\mathbf{\widehat\Sigma}^{-1}\boldsymbol{\widehat\mu}}},\
{\widehat C}^{AZ}=\sqrt{\frac{\boldsymbol{\widehat\mu}^{\top}\mathbf{\widehat\Sigma}\boldsymbol{\widehat\mu}}{(\boldsymbol{\widehat\mu}^{\top}\boldsymbol{\widehat\mu})^2}},
where n
is the sample size, \boldsymbol{\widehat\mu}
is the empirical mean vector and \mathbf{\widehat \Sigma}
is the empirical covariance matrix:
\boldsymbol{\widehat\mu}_i = \frac{1}{n}\sum_{j=1}^{n} \mathbf{X}_{j},\; \mathbf{\widehat \Sigma} =\frac{1}{n}\sum_{j=1}^{n} (\mathbf{X}_{j} - \boldsymbol{\widehat \mu})(\mathbf{X}_{j} - \boldsymbol{\widehat \mu})^{\top}.
In the univariate case (d=1
), all four variants reduce to coefficient of variation. Furthermore, their reciprocals, the so-called standardized means, are determined:
{\widehat B}^{RR}=\sqrt{\frac{\boldsymbol{\widehat\mu}^{\top}\boldsymbol{\widehat\mu}}{(\det\mathbf{\widehat\Sigma})^{1/d}}},\
{\widehat B}^{VV}=\sqrt{\frac{\boldsymbol{\widehat\mu}^{\top}\boldsymbol{\widehat\mu}}{\mathrm{tr}\mathbf{\widehat\Sigma}}},\
{\widehat B}^{VN}=\sqrt{\boldsymbol{\widehat\mu}^{\top}\mathbf{\widehat\Sigma}^{-1}\boldsymbol{\widehat\mu}},\
{\widehat B}^{AZ}=\sqrt{\frac{(\boldsymbol{\widehat\mu}^{\top}\boldsymbol{\widehat\mu})^2}{\boldsymbol{\widehat\mu}^{\top}\mathbf{\widehat\Sigma}\boldsymbol{\widehat\mu}}}.
In addition to the estimators, the respective confidence intervals [C_lwr
, C_upr
] for a given confidence level 1-\alpha
are calculated by the e_mcv()
function.
These confidence intervals are based on an asymptotic approximation by a normal distribution, see Ditzhaus and Smaga (2023) for the technical details. These approximations
do not rely on any specific (semi-)parametric assumption on the distribution and are valid nonparametrically, even for tied data.
When d>1
(respectively d=1
) a data frame with four rows (one row) corresponding to the four MCVs (the univariate CV)
and six columns containing the estimators C_est
for the MCV (CV) and the estimators B_est
for their reciprocals as well as the upper and lower bounds of the corresponding
confidence intervals [C_lwr
, C_upr
] and [B_lwr
, B_upr
].
Albert A., Zhang L. (2010) A novel definition of the multivariate coefficient of variation. Biometrical Journal 52:667-675.
Ditzhaus M., Smaga L. (2023) Inference for all variants of the multivariate coefficient of variation in factorial designs. Preprint https://arxiv.org/abs/2301.12009.
Reyment R.A. (1960) Studies on Nigerian Upper Cretaceous and Lower Tertiary Ostracoda: part 1. Senonian and Maastrichtian Ostracoda, Stockholm Contributions in Geology, vol 7.
Van Valen L. (1974) Multivariate structural statistics in natural history. Journal of Theoretical Biology 45:235-247.
Voinov V., Nikulin M. (1996) Unbiased Estimators and Their Applications, Vol. 2, Multivariate Case. Kluwer, Dordrecht.
# d > 1 (MCVs)
data_set <- lapply(list(iris[iris$Species == "setosa", 1:3],
iris[iris$Species == "versicolor", 1:3],
iris[iris$Species == "virginica", 1:3]),
as.matrix)
lapply(data_set, e_mcv)
# d = 1 (CV)
data_set <- lapply(list(iris[iris$Species == "setosa", 1],
iris[iris$Species == "versicolor", 1],
iris[iris$Species == "virginica", 1]),
as.matrix)
lapply(data_set, e_mcv)