ksample.e: E-statistic (Energy Statistic) for Multivariate k-sample Test of Equal Distributions
Description
Returns the E-statistic (energy statistic)
for the multivariate k-sample test of equal distributions.
Usage
ksample.e(x, sizes, distance = FALSE, ix = 1:sum(sizes))
Arguments
x
data matrix of pooled sample
sizes
vector of sample sizes
distance
logical: if TRUE, x is a distance matrix
ix
a permutation of the row indices of x
Value
The value of the multisample $\mathcal{E}$-statistic corresponding to
the permutation ix is returned.
concept
energy statistics
Details
The k-sample multivariate $\mathcal{E}$-statistic for testing equal distributions
is returned. The statistic is computed from the original pooled samples, stacked in
matrix x where each row is a multivariate observation, or from the distance
matrix x of the original data. The
first sizes[1] rows of x are the first sample, the next
sizes[2] rows of x are the second sample, etc.
The two-sample $\mathcal{E}$-statistic proposed by Szekely and Rizzo (2004)
is the e-distance $e(S_i,S_j)$, defined for two samples $S_i, S_j$
of size $n_i, n_j$ by
$$e(S_i,S_j)=\frac{n_i n_j}{n_i+n_j}[2M_{ij}-M_{ii}-M_{jj}],$$
where
$$M_{ij}=\frac{1}{n_i n_j}\sum_{p=1}^{n_i} \sum_{q=1}^{n_j}
\|X_{ip}-X_{jq}\|,$$
$\|\cdot\|$ denotes Euclidean norm, and $X_{ip}$ denotes the p-th observation in the i-th sample.
The k-sample
$\mathcal{E}$-statistic is defined by summing the pairwise e-distances over
all $k(k-1)/2$ pairs
of samples:
$$\mathcal{E}=\sum_{1 \leq i < j \leq k} e(S_i,S_j).$$
Large values of $\mathcal{E}$ are significant.
References
Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal
Distributions in High Dimension, InterStat, November (5).
Szekely, G. J. (2000) Technical Report 03-05:
$\mathcal{E}$-statistics: Energy of
Statistical Samples, Department of Mathematics and Statistics, Bowling
Green State University.