Computes Penrose's distance between m multivariate populations or samples, when information is available on the means and variances.
Penrose.dist(x, group)Returns an object of class "Penrose.dist", a list containing
the following components:
name | A character string describing the function. | means.vec | A numeric matrix with p rows and m columns giving the mean of each variable per group. | covs.list | A list containing the m sample covariance matrices. | Samp.sizes | A table showing the number of observations used in the calculation of the covariance matrix for each group. | PooledCov | The pooled covariance matrix. This matrix can be accessed and used as an input argument for the calculation of Mahalanobis distance in packages biotools (da Silva, 2017, 2021) and ecodist (Goslee and Urban 2007). | Penrose.mat | The Penrose distances given as a "matrix"
object. |
Penros.dist | The Penrose distances given as a "dist"
object. | group | A character string specifying the name of the classification factor defining groups. | levels.group | a vector of length m, showing the levels
in factor group. | data.name | a character string giving the name of the data. | variables | a character string vector containing the variable names. | data | the data frame analyzed. |
A data frame with \(p + 1\) columns (one factor and p response variables).
The classification factor defining m samples or groups.
It must be one of the variables in x.
Jorge Navarro Alberto, ganava4@gmail.com
Let the mean of \(X_k\) in population i be \(\mu_{ki}\), \(k=1,...,p; i=1,...,m\) and assume that the variance of variable \(X_k\) is \(V_k\). The Penrose (1953) distance \(P_{ij}\) between population i and population j is given by
$$P_{ij} = \sum_{k = 1}^{p} \frac{(\mu_{ki} - \mu_{kj})^2}{pV_k}$$
Penrose's distances between multivariate samples are computed using this expression, but \(\mu_{ki}\), \(\mu_{kj}\) and \(V_k\) being replaced by their corresponding sample estimates.
A disadvantage of Penrose's measure is that it does not consider the correlations between the p variables.
The function requires package biotools (da Silva, 2017, 2021).
da Silva, A.R. (2021). biotools: Tools for Biometry and Applied Statistics in Agricultural Science. R package version 4.2. https://cran.r-project.org/package=biotools.
da Silva, A.R., Malafaia, G., and Menezes, I.P.P. (2017). biotools: an R function to predict spatial gene diversity via an individual-based approach. Genetics and Molecular Research 16. https://doi.org/10.4238/gmr16029655.
Goslee, S.C. and Urban, D.L. (2007). The ecodist package for dissimilarity-based analysis of ecological data. Journal of Statistical Software 22(7):1-19. DOI:10.18637/jss.v022.i07
Manly, B.F.J., Navarro Alberto, J.A. and Gerow, K. (2024) Multivariate Statistical Methods. A Primer. 5th Edn. Chapman and Hall/CRC.
Penrose, L.W. (1953). Distance, size and shape. Annals of Eugenics 18: 337-43.
data(skulls)
res.Penrose <- Penrose.dist(x = skulls, group = Period)
# Brief output
res.Penrose
Run the code above in your browser using DataLab