Learn R Programming

GiniDistance (version 0.1.1)

dCor: Distance Covariance and Correlation Statistics

Description

Computes distance covariance and correlation statistics, in which Xs are quantitative and Ys are categorical and return the measures of dependence.

Usage

dCor(x, y, alpha)

Value

dCor returns the sample distance variance of x, distance variance of y, distance covariance of x and y and distance correlation of x, y.

Arguments

x

data

y

label of data or univariate response variable

alpha

exponent on Euclidean distance, in (0,2]

Details

The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments x, y are treated as data and labels.

dCor calls dcor function from energy package which computes the distance correlation between X and Y where both are numerical variables. If Y is categorical, the set difference metric on the support of \(Y\) is used. That is, \(d(y, y^\prime) =|y-y^\prime|:= I(y\neq y^\prime),\) where \(I (\cdot)\) is the indicator function. Then the sample distance correlation between data and labels is computed as follows.

Let \(A=(a_{ij})\) be a symmetric, \(n \times n\), centered distance matrix of sample \(\mathbf x_1,\cdots, \mathbf x_n\). The \((i,j)\)-th entry of \(A\) is \(a_{ij}-\frac{1}{n-2}a_{i\cdot}-\frac{1}{n-2}a_{\cdot j} + \frac{1}{(n-1)(n-2)}a_{\cdot \cdot}\) if \(i \neq j\) and 0 if \(i=j\), where \(a_{ij} = \|\mathbf x_i-\mathbf x_j\|^{\alpha}\), \(a_{i\cdot} = \sum_{j=1}^n a_{ij}\), \(a_{\cdot j} = \sum_{i=1}^n a_{ij}\), and \(a_{\cdot \cdot}=\sum_{i,j=1}^n a_{ij}\). Similarly, using the set difference metric, a symmetric, \(n \times n\), centered distance matrix is calculated for samples \(y_1,\cdots, y_n\) and denoted by \(B = (b_{ij})\). Unbiased estimators of \(\mbox{dCov}(\mathbf X,Y;\alpha)\), \(\mbox{dCov}(\mathbf X, \mathbf X;\alpha)\) and \(\mbox{dCov}(\mathbf Y, \mathbf Y;\alpha)\) are given respectively as, \(\frac{1}{n(n-3)}\sum_{i\ne j}A_{ij}B_{ij}\), \(\frac{1}{n(n-3)}\sum_{i\ne j}A_{ij}^2\) and \(\frac{1}{n(n-3)}\sum_{i\ne j}B_{ij}^2\). Then the distance correlation is

$${dCor}(\mathbf{X}, Y; \alpha) = \frac{\mbox{ dCov}(\mathbf{X}, Y, \alpha)}{ \sqrt{\mbox{ dCov}(\mathbf{X},\mathbf{X};\alpha)} \sqrt{\mbox{ dCov}(Y,Y)}}.$$

References

Lyons, R. (2013). Distance covariance in metric spaces. The Annals of Probability, 41 (5), 3284-3305.

Szekely, G. J., Rizzo, M. L. and Bakirov, N. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35 (6), 2769-2794.

Rizzo, M.L. and Szekely, G.J., (2017). Energy: E-Statistics: Multivariate Inference via the Energy of Data (R Package), Version 1.7-0.

See Also

dCov KdCov KdCor

Examples

Run this code
  x <- iris[,1:4]
  y <- unclass(iris[,5])
  dCor(x, y, alpha = 1)

Run the code above in your browser using DataLab