Performs the Schilling-Henze two-sample test for multivariate data (Schilling, 1986; Henze, 1988).
SH(X1, X2, K = 1, graph.fun = knn.bf, dist.fun = stats::dist, n.perm = 0,
dist.args = NULL, seed = 42)
An object of class htest
with the following components:
Observed value of the test statistic
Asymptotic or permutation p value
The number of within-sample edges
The alternative hypothesis
Description of the test
The dataset names
First dataset as matrix or data.frame
Second dataset as matrix or data.frame
Number of nearest neighbors to consider (default: 1)
Function for calculating a similarity graph using the distance matrix on the pooled sample (default: knn.bf
which searches for the K
nearest neighbors by ranking all pairwise distances, alternative: knn
which is a wrapper for extracting the edge matrix from the result of kNN
in dbscan, knn.fast
which is a wrapper for the approximative KNN implementation get.knn
in FNN, or any other function that calculates the KNN edge matrix from a distance matrix and the number of nearest neighbors K
).
Function for calculating a distance matrix on the pooled dataset (default: stats::dist
, Euclidean distance).
Number of permutations for permutation test (default: 0, asymptotic test is performed).
Named list of further arguments passed to dist.fun
.
Random seed (default: 42)
Target variable? | Numeric? | Categorical? | K-sample? |
No | Yes | No | No |
The test statistic is the proportion of edges connecting points from the same dataset in a K
-nearest neighbor graph calculated on the pooled sample (standardized with expectation and SD under the null).
Low values of the test statistic indicate similarity of the datasets. Thus, the null hypothesis of equal distributions is rejected for high values.
For n.perm = 0
, an asymptotic test using the asymptotic normal approximation of the conditional null distribution is performed. For n.perm > 0
, a permutation test is performed.
Schilling, M. F. (1986). Multivariate Two-Sample Tests Based on Nearest Neighbors. Journal of the American Statistical Association, 81(395), 799-806. tools:::Rd_expr_doi("10.2307/2289012")
Henze, N. (1988). A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences. The Annals of Statistics, 16(2), 772-783.
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. tools:::Rd_expr_doi("10.1214/24-SS149")
knn
, BQS
, FR
, CF
, CCS
, ZC
for other graph-based tests,
FR_cat
, CF_cat
, CCS_cat
, and ZC_cat
for versions of the test for categorical data
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform Schilling-Henze test
SH(X1, X2)
Run the code above in your browser using DataLab