The function implements the \(L_q\)-engineer metric for comparing two multivariate distributions.
Usage
engineerMetric(X1, X2, type = "F", seed = 42)
Value
An object of class htest with the following components:
method
Description of the test
statistic
Observed value of the test statistic
data.name
The dataset names
method
Description of the test
alternative
The alternative hypothesis
Arguments
X1
First dataset as matrix or data.frame
X2
Second dataset as matrix or data.frame
type
Character specifying the type of \(L_q\)-norm to use. Reasonable options are "O", "o", "1", for the \(L_1\)-norm, "I", and "i", for the \(L_\infty\)-norm, and "F", "f", "E", "e" (the default) for the \(L_2\)-norm (Euclidean norm).
seed
Random seed (default: 42). Method is deterministic, seed is only set for consistency with other methods.
Applicability
Target variable?
Numeric?
Categorical?
K-sample?
No
Yes
No
No
Details
The engineer is a primary propability metric that is defined as
$$\text{EN}(X_1, X_2; q) = \left[ \sum_{i = 1}^{p} \left| \text{E}\left(X_{1i}\right) - \text{E}\left(X_{2i}\right)\right|^q\right]^{\min(q, 1/q)} \text{ with } q> 0,$$
where \(X_{1i}, X_{2i}\) denote the \(i\)th component of the \(p\)-dimensional random vectors \(X_1\sim F_1\) and \(X_2\sim F_2\).
In the implementation, expectations are estimated by column means of the respective datasets.
References
Rachev, S. T. (1991). Probability metrics and the stability of stochastic models. John Wiley & Sons, Chichester.
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. tools:::Rd_expr_doi("10.1214/24-SS149")