bcor: Ball Correlation and Covariance Statistics

Description

Computes ball covariance and ball correlation statistics, which are multivariate measures of dependence in Banach space.

Usage

bcor(x, y, distance = FALSE, weight = FALSE)
bcov(x, y, distance = FALSE, weight = FALSE)

Arguments

a numeric vector, matirx, data.frame or dist object or list containing numeric vector, matrix, data.frame, or dist object.

a numeric vector, matirx, data.frame or dist object.

distance

if distance = TRUE, x and y will be considered as a distance matrix. Default: distance = FALSE

weight

a logical or character value used to choose the form of weight. If weight = FALSE, the ball covariance / correlation with constant weight is used. Alternatively, weight = TRUE and weight = "prob" indicates the probability weight is chosen while setting weight = "chisq" means select the Chi-square weight. Note that this arguments actually only influences the printed result in R console and is only available for the bcov.test function at present. Default: weight = FALSE

Value

bcor

sample version of ball correlation.

bcov

sample version of ball covariance.

Details

bcov and bcor compute ball covariance and ball correlation statistics.

The sample sizes (number of rows or length of the vector) of the two variables must agree, and samples must not contain missing values. If we set distance = TRUE, arguments x, y can be a dist object or a symmetric numeric matrix recording distance between samples; otherwise, these arguments are treated as data.

Ball covariance is a generic non-parametric dependence measure in Banach space, introduced by Pan et al(2017). It is noteworthy that ball covariance enjoys the following properties:

(i) It is nonnegative, and holds the Cauchy-Schwartz type inequality;

(ii) It is nonparametric and makes fewer restrictive data assumptions even without finite moment conditions;

(iii) Its empirical version is feasible and can be used as a test statistic of independence with some desired test properties;

(iv) it is interesting that the HHG dependence measure is a special case of ball covariance.

Ball correlation, based on the normalized ball covariance, generalizes the idea of Pearson correlation in two fundamental ways:

(i) Ball correlation, $ \mathbf{BCor}_{\omega}^{2}(X, Y) $, is defined for $X$ and $Y$ in arbitrary dimension in Banach space.

(ii) Ball correlation satisfies $0 \le \mathbf{BCor}_{\omega}^{2}(X, Y) \le 1$, and $ \mathbf{BCor}_{\omega}^{2}(X, Y) $ = 0 only if $X$ and $Y$ are independent.

The definitions of the sample version ball covariance and ball correlation are as follows. Suppose, we are given pairs of independent observations $\{(x_1, y_1),...,(x_n,y_n)\}$, where $x_i$ and $y_i$ can be of any dimension and the dimensionality of $x_i$ and $y_i$ need not be the same. Then, we define sample version ball covariance as: $$\mathbf{BCor}_{\omega, n}^{2}(X, Y)=\frac{1}{n^{2}}\sum_{i,j=1}^{n}{(\Delta_{ij,n}^{X,Y}-\Delta_{ij,n}^{X}\Delta_{ij,n}^{Y})^{2}} $$ where: $$ \Delta_{ij,n}^{X,Y}=\frac{1}{n}\sum_{k=1}^{n}{\delta_{ij,k}^{X} \delta_{ij,k}^{Y}}, \Delta_{ij,n}^{X}=\frac{1}{n}\sum_{k=1}^{n}{\delta_{ij,k}^{X}}, \Delta_{ij,n}^{Y}=\frac{1}{n}\sum_{k=1}^{n}{\delta_{ij,k}^{Y}} $$ $$\delta_{ij,k}^{X} = I(x_{k} \in \bar{B}(x_{i}, \rho(x_{i}, x_{j}))), \delta_{ij,k}^{Y} = I(y_{k} \in \bar{B}(y_{i}, \rho(y_{i}, y_{j})))$$

Among them, $\bar{B}(x_{i}, \rho(x_{i}, x_{j}))$ is a closed ball with center $x_{i}$ and radius $\rho(x_{i}, x_{j})$. Similarly, we can give the notations $ \mathbf{BCov}_{\omega,n}^2(\mathbf{X},\mathbf{X}) $ and $ \mathbf{BCov}_{\omega,n}^2(\mathbf{Y},\mathbf{Y}) $, which are the sample version of $ \mathbf{BCov}_{\omega}^2(\mathbf{X},\mathbf{X}) $ and $ \mathbf{BCov}_{\omega}^2(\mathbf{Y},\mathbf{Y}) $. We thus define the sample version ball correlation as follows.

$$\mathbf{BCor}_{\omega,n}^2(\mathbf{X},\mathbf{Y})= \mathbf{BCov}_{\omega,n}^2(\mathbf{X},\mathbf{Y})/\sqrt{\mathbf{BCov}_{\omega,n}^2(\mathbf{X},\mathbf{X})\mathbf{BCov}_{\omega,n}^2(\mathbf{Y},\mathbf{Y})} $$

Moreover, it is natural to extend $\mathbf{BCov}_{\omega,n}$ to measure the mutual independence between $K$ random variables:

$$\frac{1}{n^{2}}\sum_{i,j=1}^{n}{\left[ (\Delta_{ij,n}^{R_{1}, ..., R_{K}}-\prod_{k=1}^{K}\Delta_{ij,n}^{R_{k}})^{2}\prod_{k=1}^{K}{\hat{\omega}_{k}(R_{ki},R_{kj})} \right]}$$

where $R_{k}, k=1,...K$ indicate random variables and $R_{ki}, i=1,...,n$ denote $i$ th random samples of $R_{k}$.

See bcov.test for a test of multivariate independence based on the ball covariance and ball correlation statistic.

Examples

Run this code

# NOT RUN {
############# Ball Correlation #############
num <- 50
x <- 1:num
y <- 1:num
bcor(x, y)
bcor(x, y, weight = TRUE)
bcor(x, y, weight = "prob")
bcor(x, y, weight = "chisq")
############# Ball Covariance #############
n <- 50
x <- rnorm(n)
y <- rnorm(n)
bcov(x, y)
bcov(x, y, weight = TRUE)
bcov(x, y, weight = "prob")
bcov(x, y, weight = "chisq")
# }