Learn R Programming

energy (version 1.0-9)

edist: E-distance

Description

Returns the E-distances (energy statistics) between clusters.

Usage

edist(x, sizes, distance = FALSE, ix = 1:sum(sizes), alpha = 1)

Arguments

x
data matrix of pooled sample or Euclidean distances
sizes
vector of sample sizes
distance
logical: if TRUE, x is a distance matrix
ix
a permutation of the row indices of x
alpha
distance exponent

Value

  • A object of class dist containing the lower triangle of the e-distance matrix of cluster distances corresponding to the permutation of indices ix is returned.

concept

energy statistics

Details

A vector containing the pairwise two-sample multivariate $\mathcal{E}$-statistics for comparing clusters or samples is returned. The e-distance between clusters is computed from the original pooled data, stacked in matrix x where each row is a multivariate observation, or from the distance matrix x of the original data, or distance object returned by dist. The first sizes[1] rows of the original data matrix are the first sample, the next sizes[2] rows are the second sample, etc. The permutation vector ix may be used to obtain e-distances corresponding to a clustering solution at a given level in the hierarchy. The e-distance between two clusters $C_i, C_j$ of size $n_i, n_j$ proposed by Szekely and Rizzo (2003) is the e-distance $e(C_i,C_j)$, defined by $$e(C_i,C_j)=\frac{n_i n_j}{n_i+n_j}[2M_{ij}-M_{ii}-M_{jj}],$$ where $$M_{ij}=\frac{1}{n_i n_j}\sum_{p=1}^{n_i} \sum_{q=1}^{n_j} \|X_{ip}-X_{jq}\|^\alpha,$$ $\|\cdot\|$ denotes Euclidean norm, $\alpha=$ alpha, and $X_{ip}$ denotes the p-th observation in the i-th cluster. The exponent alpha should be in the interval (0,2].

References

Szekely, G. J. and Rizzo, M. L. (2005) Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method, Journal of Classification 22(2) 151-183, http://dx.doi.org/10.1007/s00357-005-0012-9. Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, November (5). Szekely, G. J. (2000) Technical Report 03-05, $\mathcal{E}$-statistics: Energy of Statistical Samples, Department of Mathematics and Statistics, Bowling Green State University.

See Also

energy.hclust eqdist.etest ksample.e

Examples

Run this code
## compute e-distances for 3 samples of iris data
 data(iris)
 edist(iris[,1:4], c(50,50,50))

## compute e-distances from a distance object
     data(iris)
     edist(dist(iris[,1:4]), c(50, 50, 50), distance=TRUE, alpha = .5)
    
     ## compute e-distances from a distance matrix
     data(iris)
     d <- as.matrix(dist(iris[,1:4]))
     edist(d, c(50, 50, 50), distance=TRUE, alpha = .5)
 ## compute e-distances from vector of group labels
 d <- dist(matrix(rnorm(100), nrow=50))
 g <- cutree(energy.hclust(d), k=4)
 edist(d, sizes=table(g), ix=rank(g, ties.method="first"))

Run the code above in your browser using DataLab