
Last chance! 50% off unlimited learning
Sale ends in
The function implements the Pan et al. (2018) multivariate two- or bd.test
implementation from the Ball package.
BallDivergence(X1, X2, ..., n.perm = 0, seed = 42, num.threads = 0,
kbd.type = "sum", weight = c("constant", "variance"),
args.bd.test = NULL)
An object of class htest
with the following components:
Observed value of the test statistic
Permutation p value (only if n.perm
> 0 and for two datasets)
Number of permutations for permutation test
Number of observations for each dataset
Description of the test
The dataset names
The alternative hypothesis
First dataset as matrix or data.frame
Second dataset as matrix or data.frame
Optionally more datasets as matrices or data.frames
Number of permutations for permutation test (default: 0, no permutation test performed). Note that for more than two samples, no test is performed.
Random seed (default: 42)
Number of threads (default: 0, all available cores are used)
Character specifying which k-sample test statistic will be used. Must be one of "sum"
(default), "maxsum"
, or "max"
.
Character specifying the weight form of the Ball Divergence test statistic. Must be one of "constant"
(default) or "variance"
.
Further arguments passed to bd.test
as a named list.
Target variable? | Numeric? | Categorical? | K-sample? |
No | Yes | No | Yes |
For n.perm = 0
, the asymptotic test is performed. For n.perm > 0
, a permutation test is performed.
The Ball Divergence is defined as the square of the measure difference over a given closed ball collection. The empirical test performed here is based on the difference between averages of metric ranks. It is robust to outliers and heavy-tailed data and suitable for imbalanced sample sizes.
The Ball Divergence of two distributions is zero if and only if the distributions coincide. Therefore, low values of the test statistic indicate similarity and the test rejects for large values of the test statistic.
For the kbd.type = "sum"
). Next, one can find the sample with the largest difference to the other, i.e. take the maximum of the sums of all Ball divergences for each sample with all other samples (kbd.type = "maxsum"
). Last, one can sum up the largest kbd.type = "max"
).
This implementation is a wrapper function around the function bd.test
that modifies the in- and output of that function to match the other functions provided in this package. For more details see bd.test
and bd
.
Pan, W., T. Y. Tian, X. Wang, H. Zhang (2018). Ball Divergence: Nonparametric two sample test, Annals of Statistics 46(3), 1109-1137, tools:::Rd_expr_doi("10.1214/17-AOS1579").
J. Zhu, W. Pan, W. Zheng, and X. Wang (2021). Ball: An R Package for Detecting Distribution Difference and Association in Metric Spaces, Journal of Statistical Software, 97(6), tools:::Rd_expr_doi("10.18637/jss.v097.i06")
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. tools:::Rd_expr_doi("10.1214/24-SS149")
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Calculate Ball Divergence and perform test
if(requireNamespace("Ball", quietly = TRUE)) {
BallDivergence(X1, X2, n.perm = 100)
}
Run the code above in your browser using DataLab