bd: Ball Divergence

Description

Compute ball divergence statistic between two-sample or K-sample.

Usage

bd(x, y = NULL, distance = FALSE, size = NULL, num.threads = 1,
  kbd.type = "sum")

Arguments

a numeric vector, matrix, data.frame, dist object or list containing vector, matrix, or data.frame.

a numeric vector, matrix or data.frame.

distance

if distance = TRUE, x will be considered as a distance matrix. Default: distance = FALSE

size

a vector record sample size of each group.

num.threads

Number of threads. Default num.threads = 1.

kbd.type

a character value controlling the output information. Setting kdb.type = "sum", kdb.type = "summax", or kdb.type = "max", the corresponding statistics value and $p$-value of $K$-sample test procedure are demonstrated. Note that this arguments actually only influences the printed result in R console. Default: kdb.type = "sum"

Value

bd

sample version of ball divergence

Details

Given the samples not containing missing values, bd returns sample version of ball divergence. If we set distance = TRUE, arguments x, y can be a dist object or a symmetric numeric matrix recording distance between samples; otherwise, these arguments are treated as data.

Ball divergence, introduced by Pan et al(2017), is a new concept to measure the difference between two probability distributions in separable Banach space. Ball divergence of two probability measures is proven to be zero if and only if they are identical.

The definitions of the sample version ball divergence are as follows. Given two independent samples $ \{x_{1}, ..., x_{n}\} $ with the associated probability measure $\mu$ and $ \{y_{1}, ..., y_{m}\} $ with $\nu$, where the observations in each sample are i.i.d.

Also, let $\delta(x,y,z)=I(z\in \bar{B}(x, \rho(x,y)))$, where $\delta(x,y,z)$ indicates whether $z$ is located in the closed ball $\bar{B}(x, \rho(x,y))$ with center $x$ and radius $\rho(x, y)$. We denote: $$ A_{ij}^{X}=\frac{1}{n}\sum_{u=1}^{n}{\delta(X_i,X_j,X_u)}, \quad A_{ij}^{Y}=\frac{1}{m}\sum_{v=1}^{m}{\delta(X_i,X_j,Y_v)} $$ $$ C_{kl}^{X}=\frac{1}{n}\sum_{u=1}^{n}{\delta(Y_k,Y_l,X_u)}, \quad C_{kl}^{Y}=\frac{1}{m}\sum_{v=1}^{m}{\delta(Y_k,Y_l,Y_v)} $$

$A_{ij}^X$ represents the proportion of samples $ \{x_{1}, ..., x_{n}\} $ located in the ball $\bar{B}(X_i,\rho(X_i,X_j))$ and $A_{ij}^Y$ represents the proportion of samples $ \{y_{1}, ..., y_{m}\} $ located in the ball $\bar{B}(X_i,\rho(X_i,X_j))$. Meanwhile, $C_{kl}^X$ and $C_{kl}^Y$ represent the corresponding proportions located in the ball $\bar{B}(Y_k,\rho(Y_k,Y_l))$.

we can define sample version ball divergence as: $$D_{n,m}=A_{n,m}+C_{n,m}$$

BD can be generalized to the K-sample problem, i.e. if we have $K$ group samples, each group include $n^{(k)}, k=1,...,K$ samples, then we can define sample version of generalized ball divergence for K-sample problem: $$\sum_{1 \leq k < l \leq K}{D_{n^{(k)},n^{(l)}}}$$

See bd.test for a test of multivariate independence based on the ball divergence.

References

Wenliang Pan, Yuan Tian, Xueqin Wang, Heping Zhang. (2017) Ball divergence: nonparametric two sample test, The Annals of Statistics, to appear

Examples

Run this code

# NOT RUN {
############# Ball Divergence #############
x <- rnorm(50)
y <- rnorm(50)
bd(x, y)
# }

Run the code above in your browser using DataLab