Learn R Programming

DataSimilarity (version 0.1.1)

MMCM: Multisample Mahalanobis Crossmatch (MMCM) Test

Description

Performs the multisample Mahalanobis crossmatch (MMCM) test (Mukherjee et al., 2022).

Usage

MMCM(X1, X2, ..., dist.fun = stats::dist, dist.args = NULL, seed = 42)

Value

An object of class htest with the following components:

statistic

Observed value of the test statistic

p.value

Asymptotic p value

alternative

The alternative hypothesis

method

Description of the test

data.name

The dataset names

Arguments

X1

First dataset as matrix or data.frame

X2

Second dataset as matrix or data.frame

...

Optionally more datasets as matrices or data.frames

dist.fun

Function for calculating a distance matrix on the pooled dataset (default: stats::dist, Euclidean distance).

dist.args

Named list of further arguments passed to dist.fun (default: NULL).

seed

Random seed (default: 42)

Applicability

Target variable?Numeric?Categorical?K-sample?
NoYesYesYes

Details

The test is an extension of the Rosenbaum (2005) crossmatch test to multiple samples. Its test statistic is the Mahalanobis distance of the observed cross-counts of all pairs of datasets.

It aims to improve the power for large dimensions or numbers of groups compared to another extension, the multisample crossmatch (MCM) test (Petrie, 2016).

The observed cross-counts are calculated using the functions distancematrix and nonbimatch from the nbpMatching package.

Small values of the test statistic indicate similarity of the datasets, therefore the test rejects the null hypothesis of equal distributions for large values of the test statistic.

References

Mukherjee, S., Agarwal, D., Zhang, N. R. and Bhattacharya, B. B. (2022). Distribution-Free Multisample Tests Based on Optimal Matchings With Applications to Single Cell Genomics, Journal of the American Statistical Association, 117(538), 627-638, tools:::Rd_expr_doi("10.1080/01621459.2020.1791131")

Rosenbaum, P. R. (2005). An Exact Distribution-Free Test Comparing Two Multivariate Distributions Based on Adjacency. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 67(4), 515-530.

Petrie, A. (2016). Graph-theoretic multisample tests of equality in distribution for high dimensional data. Computational Statistics & Data Analysis, 96, 145-158, tools:::Rd_expr_doi("10.1016/j.csda.2015.11.003")

Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. tools:::Rd_expr_doi("10.1214/24-SS149")

See Also

Petrie, Rosenbaum

Examples

Run this code
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
X3 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform MMCM test 
if(requireNamespace("nbpMatching", quietly = TRUE)) {
   MMCM(X1, X2, X3)
}

Run the code above in your browser using DataLab