# distance correlation

##### Distance Correlation and Covariance Statistics

Computes distance covariance and distance correlation statistics, which are multivariate measures of dependence.

- Keywords
- multivariate

##### Usage

```
dcov(x, y, index = 1.0)
dcor(x, y, index = 1.0)
DCOR(x, y, index = 1.0)
```

##### Arguments

- x
- data or distances of first sample
- y
- data or distances of second sample
- index
- exponent on Euclidean distance, in (0,2]

##### Details

`dcov`

and `dcor`

or `DCOR`

compute distance
covariance and distance correlation statistics.
`DCOR`

is a self-contained R function returning a list of
statistics. `dcor`

execution is faster than `DCOR`

(see examples).
The sample sizes (number of rows) of the two samples must
agree, and samples must not contain missing values. Arguments
`x`

, `y`

can optionally be `dist`

objects;
otherwise these arguments are treated as data.
Distance correlation is a new measure of dependence between random
vectors introduced by Szekely, Rizzo, and Bakirov (2007).
For all distributions with finite first moments, distance
correlation $\mathcal R$ generalizes the idea of correlation in two
fundamental ways:
(1) $\mathcal R(X,Y)$ is defined for $X$ and $Y$ in arbitrary dimension.
(2) $\mathcal R(X,Y)=0$ characterizes independence of $X$ and
$Y$.
Distance correlation satisfies $0 \le \mathcal R \le 1$, and
$\mathcal R = 0$ only if $X$ and $Y$ are independent. Distance
covariance $\mathcal V$ provides a new approach to the problem of
testing the joint independence of random vectors. The formal
definitions of the population coefficients $\mathcal V$ and
$\mathcal R$ are given in (SRB 2007). The definitions of the
empirical coefficients are as follows.
The empirical distance covariance $\mathcal{V}_n(\mathbf{X,Y})$
with index 1 is
the nonnegative number defined by
$$\mathcal{V}^2_n (\mathbf{X,Y}) = \frac{1}{n^2} \sum_{k,\,l=1}^n
A_{kl}B_{kl}$$
where $A_{kl}$ and $B_{kl}$ are
$$A_{kl} = a_{kl}-\bar a_{k.}- \bar a_{.l} + \bar a_{..}$$
$$B_{kl} = b_{kl}-\bar b_{k.}- \bar b_{.l} + \bar b_{..}.$$
Here
$$a_{kl} = \|X_k - X_l\|_p, \quad b_{kl} = \|Y_k - Y_l\|_q, \quad
k,l=1,\dots,n,$$
and the subscript `.`

denotes that the mean is computed for the
index that it replaces. Similarly,
$\mathcal{V}_n(\mathbf{X})$ is the nonnegative number defined by
$$\mathcal{V}^2_n (\mathbf{X}) = \mathcal{V}^2_n (\mathbf{X,X}) =
\frac{1}{n^2} \sum_{k,\,l=1}^n
A_{kl}^2.$$
The empirical distance correlation $\mathcal{R}_n(\mathbf{X,Y})$ is
the square root of
$$\mathcal{R}^2_n(\mathbf{X,Y})=
\frac {\mathcal{V}^2_n(\mathbf{X,Y})}
{\sqrt{ \mathcal{V}^2_n (\mathbf{X}) \mathcal{V}^2_n(\mathbf{Y})}}.$$
See `dcov.test`

for a test of multivariate independence
based on the distance covariance statistic.

##### Value

`dcov`

returns the sample distance covariance and`dcor`

returns the sample distance correlation.`DCOR`

returns a list with elementsdCov sample distance covariance dCor sample distance correlation dVarX distance variance of x sample dVarY distance variance of y sample

##### Note

Two methods of computing the statistics are provided. `DCOR`

is a stand-alone R function that returns a list of statistics.
`dcov`

and `dcor`

provide R interfaces to the C
implementation, which is usually faster. `dcov`

and `dcor`

call an internal function `.dcov`

.
Note that it is inefficient to compute dCor by:
square root of
`dcov(x,y)/sqrt(dcov(x,x)*dcov(y,y))`

because the individual
calls to `dcov`

involve unnecessary repetition of calculations.
For this reason, both `.dcov`

and `DCOR`

compute and
return all four statistics.

##### concept

- independence
- distance correlation
- distance covariance
- energy statistics

##### References

Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007),
Measuring and Testing Dependence by Correlation of Distances,
*Annals of Statistics*, Vol. 35 No. 6, pp. 2769-2794.
*Annals of Applied Statistics*,
Vol. 3, No. 4, 1236-1265.
*Annals of Applied Statistics*, Vol. 3, No. 4, 1303-1308.

##### See Also

##### Examples

```
x <- iris[1:50, 1:4]
y <- iris[51:100, 1:4]
dcov(x, y)
dcov(dist(x), dist(y)) #same thing
## C implementation
dcov(x, y, 1.5)
dcor(x, y, 1.5)
.dcov(dist(x), dist(y), 1.5)
## R implementation
DCOR(x, y, 1.5)
## compare speed of R version and C version
set.seed(111)
## R version
system.time(replicate(1000, DCOR(x, y)))
set.seed(111)
## C version
system.time(replicate(1000, .dcov(x, y)))
```

*Documentation reproduced from package energy, version 1.6.2, License: GPL (>= 2)*