Performs the nearest-neighbor-based multivariate two-sample test of Barakat et al. (1996).
BQS(X1, X2, dist.fun = stats::dist, n.perm = 0, dist.args = NULL, seed = 42)
An object of class htest
with the following components:
Observed value of the test statistic
Permutation p value (if n.perm
> 0)
The alternative hypothesis
Description of the test
The dataset names
First dataset as matrix or data.frame
Second dataset as matrix or data.frame
Function for calculating a distance matrix on the pooled dataset (default: stats::dist
, Euclidean distance).
Number of permutations for permutation test (default: 0, no test is performed).
Named list of further arguments passed to dist.fun
(default: NULL
).
Random seed (default: 42)
Target variable? | Numeric? | Categorical? | K-sample? |
No | Yes | No | No |
The test is an extension of the Schilling (1986) and Henze (1988)
neighbor test that bypasses choosing the number of nearest neighbors to consider.
The Schilling-Henze test statistic is the proportion of edges connecting points
from the same dataset in a K
-nearest neighbor graph calculated on the pooled sample (standardized with expectation and SD under the null).
Barakat et al. (1996) take the weighted sum of the Schilling-Henze test
statistics for \(K = 1,\dots,N-1\), where \(N\) denotes the pooled sample size.
As for the Schilling-Henze test, low values of the test statistic indicate similarity of the datasets. Thus, the null hypothesis of equal distributions is rejected for high values.
A permutation test is performed if n.perm
is set to a positive number.
Barakat, A.S., Quade, D. and Salama, I.A. (1996), Multivariate Homogeneity Testing Using an Extended Concept of Nearest Neighbors. Biom. J., 38: 605-612. tools:::Rd_expr_doi("10.1002/bimj.4710380509")
Schilling, M. F. (1986). Multivariate Two-Sample Tests Based on Nearest Neighbors. Journal of the American Statistical Association, 81(395), 799-806. tools:::Rd_expr_doi("10.2307/2289012")
Henze, N. (1988). A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences. The Annals of Statistics, 16(2), 772-783.
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. tools:::Rd_expr_doi("10.1214/24-SS149")
SH
, FR
, CF
, CCS
, ZC
for other graph-based tests,
FR_cat
, CF_cat
, CCS_cat
, and ZC_cat
for versions of the test for categorical data
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform Barakat et al. test
BQS(X1, X2)
Run the code above in your browser using DataLab