LHZ: Li et al. (2022) empirical characteristic distance

Description

The function implements the Li et al. (2022) empirical characteristic distance between two datasets.

Usage

LHZ(X1, X2, n.perm = 0, seed = 42)

Value

An object of class htest with the following components:

method: Description of the test
statistic: Observed value of the test statistic
p.value: Permutation p value (only if n.perm > 0)
data.name: The dataset names
alternative: The alternative hypothesis

Arguments

X1: First dataset as matrix or data.frame
X2: Second dataset as matrix or data.frame
n.perm: Number of permutations for permutation test (default: 0, no permutation test performed)
seed: Random seed (default: 42)

Applicability

Target variable?	Numeric?	Categorical?	K-sample?
No	Yes	No	No

Details

The test statistic $$T_{n, m} = \frac{1}{n^2} \sum_{j, q = 1}^n \left( \left\Vert \frac{1}{n} \sum_{k=1}^n e^{i\langle X_k, X_j-X_q \rangle} - \frac{1}{m} \sum_{l=1}^m e^{i\langle Y_l, X_j-X_q\rangle} \right\Vert^2 \right) + \frac{1}{m^2} \sum_{j, q = 1}^m \left( \left\Vert \frac{1}{n} \sum_{k=1}^n e^{i\langle X_k, Y_j-Y_q \rangle} - \frac{1}{m} \sum_{l=1}^m e^{i\langle Y_l, Y_j-Y_q\rangle} \right\Vert^2 \right) $$ is calculated according to Li et al. (2022). The datasets are denoted by $X$ and $Y$ with respective sample sizes $n$ and $m$. By $X_j$ the $i$-th row of dataset $X$ is denoted. Furthermore, $\Vert \cdot \Vert$ indicates the Euclidian norm and $\langle X_i, X_j \rangle$ indicates the inner product between $X_i$ and $X_j$.

Low values of the test statistic indicate similarity. Therefore, the permutation test rejects for large values of the test statistic.

References

Li, X., Hu, W. and Zhang, B. (2022). Measuring and testing homogeneity of distributions by characteristic distance, Statistical Papers 64 (2), 529-556, tools:::Rd_expr_doi("10.1007/s00362-022-01327-7")

Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. tools:::Rd_expr_doi("10.1214/24-SS149")

Examples

Run this code

# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Calculate LHZ statistic
LHZ(X1, X2)

Run the code above in your browser using DataLab