ksample.e: E-statistic (Energy Statistic) for Multivariate k-sample Test of Equal Distributions

Description

Returns the E-statistic (energy statistic) for the multivariate k-sample test of equal distributions.

Usage

ksample.e(x, sizes, distance = FALSE, ix = 1:sum(sizes))

Arguments

data matrix of pooled sample

sizes

vector of sample sizes

distance

logical: if TRUE, x is a distance matrix

a permutation of the row indices of x

Value

The value of the multisample $\mathcal{E}$-statistic corresponding to the permutation ix is returned.

concept

energy statistics

Details

The k-sample multivariate $\mathcal{E}$-statistic for testing equal distributions is returned. The statistic is computed from the original pooled samples, stacked in matrix x where each row is a multivariate observation, or from the distance matrix x of the original data. The first sizes[1] rows of x are the first sample, the next sizes[2] rows of x are the second sample, etc. The two-sample $\mathcal{E}$-statistic proposed by Szekely and Rizzo (2004) is the e-distance $e(S_i,S_j)$, defined for two samples $S_i, S_j$ of size $n_i, n_j$ by $$e(S_i,S_j)=\frac{n_i n_j}{n_i+n_j}[2M_{ij}-M_{ii}-M_{jj}],$$ where $$M_{ij}=\frac{1}{n_i n_j}\sum_{p=1}^{n_i} \sum_{q=1}^{n_j} \|X_{ip}-X_{jq}\|,$$ $\|\cdot\|$ denotes Euclidean norm, and $X_{ip}$ denotes the p-th observation in the i-th sample. The k-sample $\mathcal{E}$-statistic is defined by summing the pairwise e-distances over all $k(k-1)/2$ pairs of samples: $$\mathcal{E}=\sum_{1 \leq i < j \leq k} e(S_i,S_j).$$ Large values of $\mathcal{E}$ are significant.

References

Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, November (5). Szekely, G. J. (2000) Technical Report 03-05: $\mathcal{E}$-statistics: Energy of Statistical Samples, Department of Mathematics and Statistics, Bowling Green State University.

Examples

Run this code

## compute 3-sample E-statistic for 4-dimensional iris data
 data(iris)
 ksample.e(iris[,1:4], c(50,50,50))

## compute a 3-sample univariate E-statistic
 ksample.e(rnorm(150), c(25,75,50))

Run the code above in your browser using DataLab