Learn R Programming

DataSimilarity (version 0.1.1)

Quantifying Similarity of Datasets and Multivariate Two- And k-Sample Testing

Description

A collection of methods for quantifying the similarity of two or more datasets, many of which can be used for two- or k-sample testing. It provides newly implemented methods as well as wrapper functions for existing methods that enable calling many different methods in a unified framework. The methods were selected from the review and comparison of Stolte et al. (2024) .

Copy Link

Version

Install

install.packages('DataSimilarity')

Version

0.1.1

License

GPL (>= 3)

Maintainer

Marieke Stolte

Last Published

March 18th, 2025

Functions in DataSimilarity (0.1.1)

DataSimilarity-package

tools:::Rd_package_title("DataSimilarity")
DISCOB

Distance Components (DISCO) Tests
Cramer

Cramér Two-Sample Test
Energy

Energy Statistic and Test
DiProPerm

Direction-Projection-Permutation (DiProPerm) Test
DS

Rank-Based Energy Test (Deb and Sen, 2021)
DISCOF

Distance Components (DISCO) Tests
CF

Generalized Edge-Count Test
CMDistance

Constrained Minimum Distance
CF_cat

Generalized Edge-Count Test for Discrete Data
GGRL

Decision-Tree Based Measure of Dataset Distance and Two-Sample Test
HamiltonPath

Shortest Hamilton path
Jeffreys

Jeffreys divergence
KMD

Kernel Measure of Multi-Sample Dissimilarity (KMD)
GPK

Generalized Permutation-Based Kernel (GPK) Two-Sample Test
FStest

Multisample FS Test
FR

Friedman-Rafsky Test
FR_cat

Friedman-Rafsky Test for Discrete Data
LHZ

Li et al. (2022) empirical characteristic distance
HMN

Random Forest Based Two-Sample Test
LHZStatistic

Calculation of the Li et al. (2022) empirical characteristic distance
RItest

Multisample RI Test
MMD

Maximum Mean Discrepancy (MMD) Test
OTDD

Optimal Transport Dataset Distance
Rosenbaum

Rosenbaum Crossmatch Test
Petrie

Multisample Crossmatch (MCM) Test
MMCM

Multisample Mahalanobis Crossmatch (MMCM) Test
MST

Minimum Spanning Tree (MST)
MW

Nonparametric Graph-Based LP (GLP) Test
NKT

Decision-Tree Based Measure of Dataset Similarity (Ntoutsi et al., 2008)
dipro.fun

Direction-Projection Functions for DiProPerm Test
ZC

Maxtype Edge-Count Test
Wasserstein

Wasserstein Distance based Test
SC

Graph-Based Multi-Sample Test
rectPartition

Calculate a rectangular partition
knn

K-Nearest Neighbor Graph
ZC_cat

Maxtype Edge-Count Test for Discrete Data
engineerMetric

Engineer Metric
SH

Schilling-Henze Nearest Neighbor Test
YMRZL

Yu et al. (2007) Two-Sample Test
stat.fun

Univariate Two-Sample Statistics for DiProPerm Test
gTests_cat

Graph-Based Tests for Discrete Data
kerTests

Generalized Permutation-Based Kernel (GPK) Two-Sample Test
gTests

Graph-Based Tests
gTestsMulti

Graph-Based Multi-Sample Test
BG2

Biswas and Ghosh (2014) Two-Sample Test
CCS

Weighted Edge-Count Two-Sample Test
C2ST

Classifier Two-Sample Test
CCS_cat

Weighted Edge-Count Two-Sample Test for Discrete Data
BMG

Biswas et al. (2014) two-sample run test
BQS

Barakat et al. (1996) Two-Sample Test
BallDivergence

Ball Divergence based two- or \(k\)-sample test
BG

Biau and Gyorfi (2005) two-sample homogeneity test
BF

Baringhaus and Franz (2010) rigid motion invariant multivariate two-sample test
Bahr

Bahr (1996) multivariate two-sample test