Computes Gini distance covariance and correlation statistics, in which Xs are quantitative, Y are categorical, alpha is exponent on the Euclidean distance and returns the measures of dependence.
gCor(x, y, alpha)
gCor
returns the sample Gini distance covariacne and correlation between x
and y
.
data
label of data or univariate response variable
exponent on Euclidean distance, in (0,2)
gCor
compute Gini distance correlation statistics.
It is a self-contained R function returning a measure of dependence statistics.
The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments
x
, y
are treated as data and labels. alpha
if missing by default is 1, otherwise it is exponent on the Euclidean distance.
Suppose a sample data \( {\mathcal{D}} =\{(\mathbf{x}_i,y_i)\} \) for \(i = 1,...,n\) available. The sample counterparts can be easily computed. Let \({\mathcal{I}}_k \) be the index set of sample points with \(y_i =L_k\), then \(p_k\) is estimated by the sample proportion of that category, that is, \(\hat{p}_k= \frac{n_k}{n}\) where \(n_k\) is the number of elements in \({\mathcal{I}}_k\). With a given \(\alpha \in (0,2)\), a point estimator of \(\rho_g(\alpha)\) is given as follows. $$\hat{\Delta}_k(\alpha)= {n_k \choose 2}^{-1} \sum_{i<j \in {\mathcal{I}}_k} \|\mathbf{x}_i -\mathbf{x}_j\| ^{\alpha},$$ $$\hat{\Delta}(\alpha)={n \choose 2}^{-1} \sum_{1=i<j=n} \|\mathbf{x}_i -\mathbf{x}_j\| ^{\alpha},$$ $$gCor=\hat{\rho}_g (\alpha)= 1-\frac{\sum_{k=1}^K \hat p_k \hat{\Delta}_k(\alpha)}{\hat{\Delta}(\alpha)}.$$
Dang, X., Nguyen, D., Chen, Y. and Zhang, J. (2019). A new Gini correlation between quantitative and qualitative variables. Submitted to Journal of American Statistics Association.
gmd
gCov
KgCov
KgCor
x <- iris[,1:4]
y <- unclass(iris[,5])
gCor(x, y, alpha = 1)
Run the code above in your browser using DataLab