Learn R Programming

DataSimilarity (version 0.1.1)

engineerMetric: Engineer Metric

Description

The function implements the \(L_q\)-engineer metric for comparing two multivariate distributions.

Usage

engineerMetric(X1, X2, type = "F", seed = 42)

Value

An object of class htest with the following components:

method

Description of the test

statistic

Observed value of the test statistic

data.name

The dataset names

method

Description of the test

alternative

The alternative hypothesis

Arguments

X1

First dataset as matrix or data.frame

X2

Second dataset as matrix or data.frame

type

Character specifying the type of \(L_q\)-norm to use. Reasonable options are "O", "o", "1", for the \(L_1\)-norm, "I", and "i", for the \(L_\infty\)-norm, and "F", "f", "E", "e" (the default) for the \(L_2\)-norm (Euclidean norm).

seed

Random seed (default: 42). Method is deterministic, seed is only set for consistency with other methods.

Applicability

Target variable?Numeric?Categorical?K-sample?
NoYesNoNo

Details

The engineer is a primary propability metric that is defined as $$\text{EN}(X_1, X_2; q) = \left[ \sum_{i = 1}^{p} \left| \text{E}\left(X_{1i}\right) - \text{E}\left(X_{2i}\right)\right|^q\right]^{\min(q, 1/q)} \text{ with } q> 0,$$ where \(X_{1i}, X_{2i}\) denote the \(i\)th component of the \(p\)-dimensional random vectors \(X_1\sim F_1\) and \(X_2\sim F_2\).

In the implementation, expectations are estimated by column means of the respective datasets.

References

Rachev, S. T. (1991). Probability metrics and the stability of stochastic models. John Wiley & Sons, Chichester.

Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. tools:::Rd_expr_doi("10.1214/24-SS149")

See Also

Jeffreys

Examples

Run this code
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Calculate engineer metric
engineerMetric(X1, X2)

Run the code above in your browser using DataLab