Learn R Programming

GiniDistance (version 0.1.1)

dCov: Distance Covariance Statistic

Description

Computes distance covariance statistic, in which Xs are quantitative and Y are categorical and return the measures of dependence.

Usage

dCov(x, y, alpha)

Value

dCov returns the sample distance covariance between data x and label y.

Arguments

x

data

y

label of data or response variable

alpha

exponent on Euclidean distance, in (0,2]

Details

dCov calls dcov function from energy package to compute distance covariance statistic. The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments x, y are treated as data and labels.

The distance covariance (Sezekley07) is extended from Euclidean space to general metric spaces by Lyons (2013). Based on that idea, we define the discrete metric $$d(y, y^\prime) =|y-y^\prime|:= I(y\neq y^\prime),$$ where \(I (\cdot)\) is the indicator function. Equipped with this set difference metric on the support of \(Y\) and Euclidean distance on the support of \(\mathbf{X}\), the corresponding distance covariance and distance correlation for numerical \(\mathbf{X}\) and categorical \(Y\) variables are as follows.

Let \(A=(a_{ij})\) be a symmetric, \(n \times n\), centered distance matrix of sample \(\bf x_1,\cdots, \bf x_n\). The \((i,j)\)-th entry of \(A\) is \(a_{ij}-\frac{1}{n-2}a_{i\cdot}-\frac{1}{n-2}a_{\cdot j} + \frac{1}{(n-1)(n-2)}a_{\cdot \cdot}\) if \(i \neq j\) and 0 if \(i=j\), where \(a_{ij} = \|\bf x_i-\bf x_j\|^{\alpha}\), \(a_{i\cdot} = \sum_{j=1}^n a_{ij}\), \(a_{\cdot j} = \sum_{i=1}^n a_{ij}\), and \(a_{\cdot \cdot}=\sum_{i,j=1}^n a_{ij}\). Similarly, using the set difference metric, a symmetric, \(n \times n\), centered distance matrix is calculated for samples \(y_1,\cdots, y_n\) and denoted by \(B = (b_{ij})\). Unbiased estimators of \(\mbox{dCov}(\bf X,Y;\alpha)\) is

\(\frac{1}{n(n-3)}\sum_{i\ne j}A_{ij}B_{ij}\).

References

Lyons, R. (2013). Distance covariance in metric spaces. The Annals of Probability, 41 (5), 3284-3305.

Rizzo, M.L. and Szekely, G.J., (2017). Energy: E-Statistics: Multivariate Inference via the Energy of Data (R Package), Version 1.7-0.

Szekely, G. J., Rizzo, M. L. and Bakirov, N. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35 (6), 2769-2794.

See Also

dCor KdCov KdCor

Examples

Run this code
  x <- iris[,1:4]
  y <- unclass(iris[,5])
  dCov(x, y, alpha = 1)

Run the code above in your browser using DataLab