engineerMetric: Engineer Metric

Description

The function implements the $L_q$-engineer metric for comparing two multivariate distributions.

Usage

engineerMetric(X1, X2, type = "F", seed = 42)

Value

An object of class htest with the following components:

method: Description of the test
statistic: Observed value of the test statistic
data.name: The dataset names
method: Description of the test
alternative: The alternative hypothesis

Arguments

X1: First dataset as matrix or data.frame
X2: Second dataset as matrix or data.frame
type: Character specifying the type of $L_q$-norm to use. Reasonable options are "O", "o", "1", for the $L_1$-norm, "I", and "i", for the $L_\infty$-norm, and "F", "f", "E", "e" (the default) for the $L_2$-norm (Euclidean norm).
seed: Random seed (default: 42). Method is deterministic, seed is only set for consistency with other methods.

Applicability

Target variable?	Numeric?	Categorical?	K-sample?
No	Yes	No	No

Details

The engineer is a primary propability metric that is defined as $$\text{EN}(X_1, X_2; q) = \left[ \sum_{i = 1}^{p} \left| \text{E}\left(X_{1i}\right) - \text{E}\left(X_{2i}\right)\right|^q\right]^{\min(q, 1/q)} \text{ with } q> 0,$$ where $X_{1i}, X_{2i}$ denote the $i$th component of the $p$-dimensional random vectors $X_1\sim F_1$ and $X_2\sim F_2$.

In the implementation, expectations are estimated by column means of the respective datasets.

References

Rachev, S. T. (1991). Probability metrics and the stability of stochastic models. John Wiley & Sons, Chichester.

Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. tools:::Rd_expr_doi("10.1214/24-SS149")

Examples

Run this code

# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Calculate engineer metric
engineerMetric(X1, X2)

Run the code above in your browser using DataLab