ot_barycenter_test: Test equality of probability vectors

Description

Perform optimal transport (OT) barycenter based tests for equality of probability vectors in a one-way layout.

Usage

ot_barycenter_test(
  samples,
  costm,
  null.mu = NULL,
  w = NULL,
  num.sim = 1000,
  solver = ot_test_lp_solver(),
  is.metric = is_metric_cost_mat(costm, tol.ti = Inf),
  verbose = FALSE
)

Value

An object of class "ot_barycenter_test" containing:

`mu`	empirical version of \(\mu\) that is based on `samples`
`n`	the sample sizes
`p.value`	the \(p\)-value
`statistic`	the value of the test statistic
`null.samples`	samples drawn from the null distribution

Arguments

samples: matrix (row-wise) or nested list containing \(K\) count vectors. A count vector is a vector of length \(N\) that contains the number of times a sample was observed at the respective points.
costm: semi-metric cost matrix \(c \in \mathbb{R}^{N \times N}\).
null.mu: probability measures \(\mu\) underlying the null distribution. Must be of the same structure as samples.
w: weight vector \(w \in \mathbb{R}^K_+\).
num.sim: number of samples to draw from the limiting null distribution.
solver: the LP solver to use, see ot_test_lp_solver.
is.metric: value indicating whether \(c\) is a metric cost matrix, see is_metric_cost_mat.
verbose: logical value indicating whether additional information should be printed.

Details

Denote with \(\mu^1, \ldots, \mu^K\) the probability measures that underlie the samples contained in samples. To test for the one-way null hypothesis \(H_0 : \mu^1 = \ldots = \mu^K\), this test employs the OT barycenter statistic which is defined as \( T^B(\mu) := \sqrt{\rho_n} B_c^w(\mu^1, \ldots, \mu^K)\,, \) where \(\rho_n\) is a scaling factor and \(B_c^w\) is the OT barycenter functional, see ot_barycenter.

The test is based on the asymptotic distribution of \(T^B\) under under the null, for more details see the reference.

These simulations can be done in parallel via future::plan and the progress can be shown with progressr::with_progress.

Especially for large \(N\) and \(K\), simulating a sufficient number of samples from the limiting null distribution might take a while. Consider using FDOTT instead.

References

TODO

Examples

Run this code


# enable txt progressbar
progressr::handlers("txtprogressbar")
# enable parallel computation
if (requireNamespace("future")) {
    future::plan(future::multisession)
}

K <- 3
N <- 2
costm <- cost_matrix_lp(1:N)

# use higher number to better approximate null distribution and get more accurate p-value
num.sim <- 10

n <- c(300, 360, 200)

# underlying probability vectors
mu <- matrix(1 / N, K, N, TRUE)

# to run this, a LP solver must be available for ROI (ROI.plugin.glpk by default)
if (requireNamespace("ROI.plugin.glpk")) {
    solver <- ot_test_lp_solver("glpk")
    set.seed(123)
    samples <- tab_sample(n, mu)
    progressr::with_progress({
        res <- ot_barycenter_test(samples, costm, num.sim = num.sim, solver = solver)
    })
    print(res)
}

# measures are not equal anymore
mu[2, ] <- 1:N / sum(1:N)

if (requireNamespace("ROI.plugin.glpk")) {
    solver <- ot_test_lp_solver("glpk")
    set.seed(123)
    samples <- tab_sample(n, mu)
    progressr::with_progress({
        res2 <- ot_barycenter_test(samples, costm, num.sim = num.sim, solver = solver)
    })
    print(res2)
}
# \dontshow{
## R CMD check: make sure any open connections are closed afterward
if (requireNamespace("future") && !inherits(future::plan(), "sequential")) future::plan(future::sequential)
# }

Run the code above in your browser using DataLab