BQS: Barakat et al. (1996) Two-Sample Test

Description

Performs the nearest-neighbor-based multivariate two-sample test of Barakat et al. (1996).

Usage

BQS(X1, X2, dist.fun = stats::dist, n.perm = 0, dist.args = NULL, seed = 42)

Value

An object of class htest with the following components:

statistic: Observed value of the test statistic
p.value: Permutation p value (if n.perm > 0)
alternative: The alternative hypothesis
method: Description of the test
data.name: The dataset names

Arguments

X1: First dataset as matrix or data.frame
X2: Second dataset as matrix or data.frame
dist.fun: Function for calculating a distance matrix on the pooled dataset (default: stats::dist, Euclidean distance).
n.perm: Number of permutations for permutation test (default: 0, no test is performed).
dist.args: Named list of further arguments passed to dist.fun (default: NULL).
seed: Random seed (default: 42)

Applicability

Target variable?	Numeric?	Categorical?	K-sample?
No	Yes	No	No

Details

The test is an extension of the Schilling (1986) and Henze (1988) neighbor test that bypasses choosing the number of nearest neighbors to consider. The Schilling-Henze test statistic is the proportion of edges connecting points from the same dataset in a K-nearest neighbor graph calculated on the pooled sample (standardized with expectation and SD under the null). Barakat et al. (1996) take the weighted sum of the Schilling-Henze test statistics for \(K = 1,\dots,N-1\), where \(N\) denotes the pooled sample size.

As for the Schilling-Henze test, low values of the test statistic indicate similarity of the datasets. Thus, the null hypothesis of equal distributions is rejected for high values. A permutation test is performed if n.perm is set to a positive number.

References

Barakat, A.S., Quade, D. and Salama, I.A. (1996), Multivariate Homogeneity Testing Using an Extended Concept of Nearest Neighbors. Biom. J., 38: 605-612. tools:::Rd_expr_doi("10.1002/bimj.4710380509")

Schilling, M. F. (1986). Multivariate Two-Sample Tests Based on Nearest Neighbors. Journal of the American Statistical Association, 81(395), 799-806. tools:::Rd_expr_doi("10.2307/2289012")

Henze, N. (1988). A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences. The Annals of Statistics, 16(2), 772-783.

Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. tools:::Rd_expr_doi("10.1214/24-SS149")

Examples

Run this code

# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform Barakat et al. test
BQS(X1, X2)

Run the code above in your browser using DataLab