The function implements the Bahr (1996) multivariate two-sample test. This test is a special case of the rigid-motion invariant multivariate two-sample test of Baringhaus and Franz (2010). The implementation here uses the cramer.test
implementation from the cramer package.
Bahr(X1, X2, n.perm = 0, just.statistic = n.perm
An object of class htest
with the following components:
Description of the test
Number of variables in each dataset
Sample size of first dataset
Sample size of second dataset
Observed value of the test statistic
Boostrap/ permutation p value (only if n.perm
> 0)
Type of Boostrap or eigenvalue method (only if n.perm
> 0)
Number of permutations for permutation or Boostrap test
Distribution function under the null hypothesis reconstructed via fast Fourier transform. $x
contains the x-values, $Fx
contains the corresponding distribution function values. (only if n.perm
> 0)
Eigenvalues and eigenfunctions when using the eigenvalue method (only if n.perm
> 0)
The dataset names
The alternative hypothesis
First dataset as matrix or data.frame
Second dataset as matrix or data.frame
Number of permutations for permutation or Bootstrap test, respectively (default: 0, no permutation test performed)
Should only the test statistic be calculated without performing any test (default: TRUE
if number of permutations is set to 0 and FALSE
if number of permutations is set to any positive number)
Type of Bootstrap or eigenvalue method for testing. Possible options are "ordinary"
(default) for ordinary Boostrap, "permutation"
for permutation testing, or "eigenvalue"
for bootstrapping the limit distribution (especially good for datasets too large for performing Bootstrapping). For more details see cramer.test
Maximum number of points used for fast Fourier transform involved in eigenvalue method for approximating the null distribution (default: 2^14). Ignored if sim is either "ordinary"
or "permutation"
. For more details see cramer.test
.
Upper value up to which the integral for calculating the distribution function from the characteristic function is evaluated (default: 160). Note: when K
is increased, it is necessary to also increase maxM
. Ignored if sim is either "ordinary"
or "permutation"
. For more details see cramer.test
.
Random seed (default: 42)
Target variable? | Numeric? | Categorical? | K-sample? |
No | Yes | No | No |
The Bahr (1996) test is a specialcase of the test of Bahrinhaus and Franz (2010) $$T_{n_1, n_2} = \frac{n_1 n_2}{n_1+n_2}\left(\frac{2}{n_1 n_2}\sum_{i=1}^{n_1}\sum_{j=1}^{n_2} \phi(||X_{1i} - X_{2j}||^2) - \frac{1}{n_1^2}\sum_{i,j=1}^{n_1} \phi(||X_{1i} - X_{1j}||^2) - \frac{1}{n_2^2}\sum_{i,j=1}^{n_2} \phi(||X_{2i} - X_{2j}||^2)\right)$$ where the kernel function \(\phi\) is set to $$\phi_{\text{Bahr}}(x) = 1 - \exp(-x/2).$$ The theoretical statistic underlying this test statistic is zero if and only if the distributions coincide. Therefore, low values of the test statistic incidate similarity of the datasets while high values indicate differences between the datasets.
This implementation is a wrapper function around the function cramer.test
that modifies the in- and output of that function to match the other functions provided in this package. For more details see the cramer.test
.
Baringhaus, L. and Franz, C. (2010). Rigid motion invariant two-sample tests, Statistica Sinica 20, 1333-1361
Bahr, R. (1996). Ein neuer Test fuer das mehrdimensionale Zwei-Stichproben-Problem bei allgemeiner Alternative, German, Ph.D. thesis, University of Hanover
Franz, C. (2024). cramer: Multivariate Nonparametric Cramer-Test for the Two-Sample-Problem. R package version 0.9-4, https://CRAN.R-project.org/package=cramer.
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. tools:::Rd_expr_doi("10.1214/24-SS149")
BF
, Cramer
, Energy
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform Bahr test
if(requireNamespace("cramer", quietly = TRUE)) {
Bahr(X1, X2, n.perm = 100)
}
Run the code above in your browser using DataLab