distance correlation



Distance Correlation and Covariance Statistics

Computes distance covariance and distance correlation statistics, which are multivariate measures of dependence.

dcov(x, y, index = 1.0)
dcor(x, y, index = 1.0)
DCOR(x, y, index = 1.0)
data or distances of first sample
data or distances of second sample
exponent on Euclidean distance, in (0,2]

dcov and dcor or DCOR compute distance covariance and distance correlation statistics. DCOR is a self-contained R function returning a list of statistics. dcor execution is faster than DCOR (see examples). The sample sizes (number of rows) of the two samples must agree, and samples must not contain missing values. Arguments x, y can optionally be dist objects; otherwise these arguments are treated as data. Distance correlation is a new measure of dependence between random vectors introduced by Szekely, Rizzo, and Bakirov (2007). For all distributions with finite first moments, distance correlation $\mathcal R$ generalizes the idea of correlation in two fundamental ways: (1) $\mathcal R(X,Y)$ is defined for $X$ and $Y$ in arbitrary dimension. (2) $\mathcal R(X,Y)=0$ characterizes independence of $X$ and $Y$. Distance correlation satisfies $0 \le \mathcal R \le 1$, and $\mathcal R = 0$ only if $X$ and $Y$ are independent. Distance covariance $\mathcal V$ provides a new approach to the problem of testing the joint independence of random vectors. The formal definitions of the population coefficients $\mathcal V$ and $\mathcal R$ are given in (SRB 2007). The definitions of the empirical coefficients are as follows. The empirical distance covariance $\mathcal{V}_n(\mathbf{X,Y})$ with index 1 is the nonnegative number defined by $$\mathcal{V}^2_n (\mathbf{X,Y}) = \frac{1}{n^2} \sum_{k,\,l=1}^n A_{kl}B_{kl}$$ where $A_{kl}$ and $B_{kl}$ are $$A_{kl} = a_{kl}-\bar a_{k.}- \bar a_{.l} + \bar a_{..}$$ $$B_{kl} = b_{kl}-\bar b_{k.}- \bar b_{.l} + \bar b_{..}.$$ Here $$a_{kl} = \|X_k - X_l\|_p, \quad b_{kl} = \|Y_k - Y_l\|_q, \quad k,l=1,\dots,n,$$ and the subscript . denotes that the mean is computed for the index that it replaces. Similarly, $\mathcal{V}_n(\mathbf{X})$ is the nonnegative number defined by $$\mathcal{V}^2_n (\mathbf{X}) = \mathcal{V}^2_n (\mathbf{X,X}) = \frac{1}{n^2} \sum_{k,\,l=1}^n A_{kl}^2.$$ The empirical distance correlation $\mathcal{R}_n(\mathbf{X,Y})$ is the square root of $$\mathcal{R}^2_n(\mathbf{X,Y})= \frac {\mathcal{V}^2_n(\mathbf{X,Y})} {\sqrt{ \mathcal{V}^2_n (\mathbf{X}) \mathcal{V}^2_n(\mathbf{Y})}}.$$ See dcov.test for a test of multivariate independence based on the distance covariance statistic.


  • dcov returns the sample distance covariance and dcor returns the sample distance correlation. DCOR returns a list with elements
  • dCovsample distance covariance
  • dCorsample distance correlation
  • dVarXdistance variance of x sample
  • dVarYdistance variance of y sample


Two methods of computing the statistics are provided. DCOR is a stand-alone R function that returns a list of statistics. dcov and dcor provide R interfaces to the C implementation, which is usually faster. dcov and dcor call an internal function .dcov. Note that it is inefficient to compute dCor by: square root of dcov(x,y)/sqrt(dcov(x,x)*dcov(y,y)) because the individual calls to dcov involve unnecessary repetition of calculations. For this reason, both .dcov and DCOR compute and return all four statistics.


  • independence
  • distance correlation
  • distance covariance
  • energy statistics


Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007), Measuring and Testing Dependence by Correlation of Distances, Annals of Statistics, Vol. 35 No. 6, pp. 2769-2794. http://dx.doi.org/10.1214/009053607000000505 Szekely, G.J. and Rizzo, M.L. (2009), Brownian Distance Covariance, Annals of Applied Statistics, Vol. 3, No. 4, 1236-1265. http://dx.doi.org/10.1214/09-AOAS312 Szekely, G.J. and Rizzo, M.L. (2009), Rejoinder: Brownian Distance Covariance, Annals of Applied Statistics, Vol. 3, No. 4, 1303-1308.

See Also

dcov.test dcor.ttest

  • dcor
  • dcov
  • DCOR
x <- iris[1:50, 1:4]
 y <- iris[51:100, 1:4]
 dcov(x, y)
 dcov(dist(x), dist(y))  #same thing
 ## C implementation
 dcov(x, y, 1.5)
 dcor(x, y, 1.5)
 .dcov(dist(x), dist(y), 1.5)
 ## R implementation
 DCOR(x, y, 1.5)
 ## compare speed of R version and C version 
 ## R version
 system.time(replicate(1000, DCOR(x, y)))
 ## C version
 system.time(replicate(1000, .dcov(x, y)))
Documentation reproduced from package energy, version 1.6.2, License: GPL (>= 2)

Community examples

Looks like there are no examples yet.