The function implements the Li et al. (2022) empirical characteristic distance between two datasets.
LHZ(X1, X2, n.perm = 0, seed = 42)
An object of class htest
with the following components:
Description of the test
Observed value of the test statistic
Permutation p value (only if n.perm
> 0)
The dataset names
The alternative hypothesis
First dataset as matrix or data.frame
Second dataset as matrix or data.frame
Number of permutations for permutation test (default: 0, no permutation test performed)
Random seed (default: 42)
Target variable? | Numeric? | Categorical? | K-sample? |
No | Yes | No | No |
The test statistic $$T_{n, m} = \frac{1}{n^2} \sum_{j, q = 1}^n \left( \left\Vert \frac{1}{n} \sum_{k=1}^n e^{i\langle X_k, X_j-X_q \rangle} - \frac{1}{m} \sum_{l=1}^m e^{i\langle Y_l, X_j-X_q\rangle} \right\Vert^2 \right) + \frac{1}{m^2} \sum_{j, q = 1}^m \left( \left\Vert \frac{1}{n} \sum_{k=1}^n e^{i\langle X_k, Y_j-Y_q \rangle} - \frac{1}{m} \sum_{l=1}^m e^{i\langle Y_l, Y_j-Y_q\rangle} \right\Vert^2 \right) $$ is calculated according to Li et al. (2022). The datasets are denoted by \(X\) and \(Y\) with respective sample sizes \(n\) and \(m\). By \(X_j\) the \(i\)-th row of dataset \(X\) is denoted. Furthermore, \(\Vert \cdot \Vert\) indicates the Euclidian norm and \(\langle X_i, X_j \rangle\) indicates the inner product between \(X_i\) and \(X_j\).
Low values of the test statistic indicate similarity. Therefore, the permutation test rejects for large values of the test statistic.
Li, X., Hu, W. and Zhang, B. (2022). Measuring and testing homogeneity of distributions by characteristic distance, Statistical Papers 64 (2), 529-556, tools:::Rd_expr_doi("10.1007/s00362-022-01327-7")
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. tools:::Rd_expr_doi("10.1214/24-SS149")
LHZStatistic
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Calculate LHZ statistic
LHZ(X1, X2)
Run the code above in your browser using DataLab