Computes a type of multivariate nonparametric E-statistic and test of independence based on independence coefficient \(\mathcal I_n\). This coefficient pre-dates and is different from distance covariance or distance correlation.

```
mvI.test(x, y, R)
mvI(x, y)
```

`mvI`

returns the statistic. `mvI.test`

returns
a list with class

`htest`

containing

- method
description of test

- statistic
observed value of the test statistic \(n\mathcal I_n^2\)

- estimate
\(\mathcal I_n\)

- replicates
permutation replicates

- p.value
p-value of the test

- data.name
description of data

- x
matrix: first sample, observations in rows

- y
matrix: second sample, observations in rows

- R
number of replicates

Maria L. Rizzo mrizzo@bgsu.edu and Gabor J. Szekely

`mvI`

computes the coefficient \(\mathcal I_n\) and `mvI.test`

performs a nonparametric test of independence. The test decision is obtained via permutation
bootstrap, with `R`

replicates.
The sample sizes (number of rows) of the two samples must agree, and
samples must not contain missing values.

Historically this is the first energy test of independence. The
distance covariance test `dcov.test`

, distance correlation `dcor`

, and related methods are more recent (2007, 2009).

The distance covariance test `dcov.test`

and distance correlation test `dcor.test`

are much faster and have different properties than `mvI.test`

. All are based on a population independence coefficient that characterizes independence and of these tests are statistically consistent. However, dCor is scale invariant while \(I_n\) is not. In applications `dcor.test`

or `dcov.test`

are the recommended tests.

Computing formula from Bakirov, Rizzo, and Szekely (2006), equation (2):

Suppose the two samples are \(X_1,\dots,X_n \in R^p\) and \(Y_1,\dots,Y_n \in R^q\). Define \(Z_{kl} = (X_k, Y_l) \in R^{p+q}.\)

The independence coefficient \(\mathcal I_n\) is defined $$ \mathcal I_n = \sqrt{\frac{2\bar z - z_d - z}{x + y - z}}, $$ where $$z_d= \frac{1}{n^2} \sum_{k,l=1}^n |Z_{kk}-Z_{ll}|_{p+q},$$ $$z= \frac{1}{n^4} \sum_{k,l=1}^n \sum_{i,j=1}^n |Z_{kl}-Z_{ij}|_{p+q},$$ $$\bar z= \frac{1}{n^3} \sum_{k=1}^n \sum_{i,j=1}^n |Z_{kk}-Z_{ij}|_{p+q},$$ $$x= \frac{1}{n^2} \sum_{k,l=1}^n |X_{k}-X_{l}|_p,$$ $$y= \frac{1}{n^2} \sum_{k,l=1}^n |Y_{k}-Y_{l}|_q.$$

Some properties:

\(0 \leq \mathcal I_n \leq 1\) (Theorem 1).

Large values of \(n \mathcal I_n^2\) (or \(\mathcal I_n\)) support the alternative hypothesis that the sampled random variables are dependent.

\(\mathcal I_n\) is invariant to shifts and orthogonal transformations of X and Y.

\(\sqrt{n} \, \mathcal I_n\) determines a statistically consistent test of independence against all fixed dependent alternatives (Corollary 1).

The population independence coefficient \(\mathcal I\) is a normalized distance between the joint characteristic function and the product of the marginal characteristic functions. \(\mathcal I_n\) converges almost surely to \(\mathcal I\) as \(n \to \infty\). X and Y are independent if and only if \(\mathcal I(X, Y) = 0\). See the reference below for more details.

Bakirov, N.K., Rizzo, M.L., and Szekely, G.J. (2006), A Multivariate
Nonparametric Test of Independence, *Journal of Multivariate Analysis* 93/1, 58-80.

Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007),
Measuring and Testing Dependence by Correlation of Distances,
*Annals of Statistics*, Vol. 35 No. 6, pp. 2769-2794.

Szekely, G.J. and Rizzo, M.L. (2009),
Brownian Distance Covariance,
*Annals of Applied Statistics*,
Vol. 3, No. 4, 1236-1265.

` dcov.test `

` dcov `

` dcor.test `

` dcor `

` dcov2d `

` dcor2d `

` indep.test `

```
mvI(iris[1:25, 1], iris[1:25, 2])
# \donttest{
mvI.test(iris[1:25, 1], iris[1:25, 2], R=99)
# }
```

Run the code above in your browser using DataLab