The k-sample multivariate \(\mathcal{E}\)-test of equal distributions
is performed. The statistic is computed from the original
pooled samples, stacked in matrix x
where each row
is a multivariate observation, or the corresponding distance matrix. The
first sizes[1]
rows of x
are the first sample, the next
sizes[2]
rows of x
are the second sample, etc.
The test is implemented by nonparametric bootstrap, an approximate
permutation test with R
replicates.
The function eqdist.e
returns the test statistic only; it simply
passes the arguments through to eqdist.etest
with R = 0
.
The k-sample multivariate \(\mathcal{E}\)-statistic for testing equal distributions
is returned. The statistic is computed from the original pooled samples, stacked in
matrix x
where each row is a multivariate observation, or from the distance
matrix x
of the original data. The
first sizes[1]
rows of x
are the first sample, the next
sizes[2]
rows of x
are the second sample, etc.
The two-sample \(\mathcal{E}\)-statistic proposed by
Szekely and Rizzo (2004)
is the e-distance \(e(S_i,S_j)\), defined for two samples \(S_i, S_j\)
of size \(n_i, n_j\) by
$$e(S_i,S_j)=\frac{n_i n_j}{n_i+n_j}[2M_{ij}-M_{ii}-M_{jj}],
$$
where
$$M_{ij}=\frac{1}{n_i n_j}\sum_{p=1}^{n_i} \sum_{q=1}^{n_j}
\|X_{ip}-X_{jq}\|,$$
\(\|\cdot\|\) denotes Euclidean norm, and \(X_{ip}\) denotes the p-th observation in the i-th sample.
The original (default method) k-sample
\(\mathcal{E}\)-statistic is defined by summing the pairwise e-distances over
all \(k(k-1)/2\) pairs
of samples:
$$\mathcal{E}=\sum_{1 \leq i < j \leq k} e(S_i,S_j).
$$
Large values of \(\mathcal{E}\) are significant.
The discoB
method computes the between-sample disco statistic.
For a one-way analysis, it is related to the original statistic as follows.
In the above equation, the weights \(\frac{n_i n_j}{n_i+n_j}\)
are replaced with
$$\frac{n_i + n_j}{2N}\frac{n_i n_j}{n_i+n_j} =
\frac{n_i n_j}{2N}$$
where N is the total number of observations: \(N=n_1+...+n_k\).
The discoF
method is based on the disco F ratio, while the discoB
method is based on the between sample component.
Also see disco
and disco.between
functions.