unif_test: Circular and (hyper)spherical uniformity tests

Description

Implementation of several uniformity tests on the (hyper)sphere $S^{p-1}:=\{{\bf x}\in R^p:||{\bf x}||=1\}$, $p\ge 2$, with calibration either in terms of their asymptotic/exact distributions, if available, or Monte Carlo.

unif_test receives a sample of directions ${\bf X}_1,\ldots,{\bf X}_n\in S^{p-1}$ in Cartesian coordinates, except for the circular case ($p=2$) in which the sample can be represented in terms of angles $\Theta_1,\ldots,\Theta_n\in [0, 2\pi)$.

unif_test allows to perform several tests within a single call, facilitating thus the exploration of a dataset by applying several tests.

Usage

unif_test(data, type = "all", p_value = "asymp", alpha = c(0.1, 0.05,
  0.01), M = 10000, stats_MC = NULL, crit_val = NULL,
  data_sorted = FALSE, K_max = 10000, method = "I", CCF09_dirs = NULL,
  CJ12_beta = 0, CJ12_reg = 3, cov_a = 2 * pi, Cressie_t = 1/3,
  K_CCF09 = 25, Poisson_rho = 0.5, Pycke_q = 0.5, Rayleigh_m = 1,
  Riesz_s = 1, Rothman_t = 1/3, Sobolev_vk2 = c(0, 0, 1),
  Softmax_kappa = 1, Stein_K = 10, Stein_cf = FALSE, Stereo_a = 0, ...)

Value

If only a single test is performed, a list with class htest containing the following components:

statistic: the value of the test statistic.
p.value: the p-value of the test. If p_value = "crit_val", an NA.
alternative: a character string describing the alternative hypothesis.
method: a character string indicating what type of test was performed.
data.name: a character string giving the name of the data.
reject: the rejection decision for the levels of significance alpha.
crit_val: a vector with the critical values for the significance levels alpha used with p_value = "MC" or p_value = "asymp".
param: parameter(s) used in the test (if any).

If several tests are performed, a type-named list with entries for each test given by the above list.

Arguments

data

sample to perform the test. A matrix of size c(n, p) containing a sample of size n of directions (in Cartesian coordinates) on $S^{p-1}$. Alternatively if p = 2, a matrix of size c(n, 1) containing the n angles on $[0, 2\pi)$ of the circular sample on $S^{1}$. Other objects accepted are an array of size c(n, p, 1) with directions (in Cartesian coordinates), or a vector of size n or an array of size c(n, 1, 1) with angular data. Must not contain NA's.

type

type of test to be applied. A character vector containing any of the following types of tests, depending on the dimension $p$:

Circular data: any of the names available at object avail_cir_tests.
(Hyper)spherical data: any of the names available at object avail_sph_tests.

If type = "all" (default), then type is set as avail_cir_tests or avail_sph_tests, depending on the value of $p$.

p_value

type of $p$-value computation. Either "MC" for employing the approximation by Monte Carlo of the exact null distribution, "asymp" (default) for the use of the asymptotic/exact null distribution (if available), or "crit_val" for approximation by means of the table of critical values crit_val.

alpha

vector with significance levels. Defaults to c(0.10, 0.05, 0.01).

M

number of Monte Carlo replications for approximating the null distribution when approx = "MC". Also, number of Monte Carlo samples for approximating the asymptotic distributions based on weighted sums of chi squared random variables. Defaults to 1e4.

stats_MC

a data frame of size c(M, length(type)), with column names containing the character vector type, that results from extracting $stats_MC from a call to unif_stat_MC. If provided, the computation of Monte Carlo statistics when approx = "MC" is skipped. stats_MC is checked internally to see if it is sorted. Internally computed if NULL (default).

crit_val

table with critical values for the tests, to be used if p_value = "crit_val". A data frame, with column names containing the character vector type and rows corresponding to the significance levels alpha, that results from extracting $crit_val_MC from a call to unif_stat_MC. Internally computed if NULL (default).

data_sorted

is the circular data sorted? If TRUE, certain statistics are faster to compute. Defaults to FALSE.

K_max

integer giving the truncation of the series that compute the asymptotic p-value of a Sobolev test. Defaults to 1e4.

method

method for approximating the density, distribution, or quantile function of the weighted sum of chi squared random variables. Must be "I" (Imhof), "SW" (Satterthwaite--Welch), "HBE" (Hall--Buckley--Eagleson), or "MC" (Monte Carlo; only for distribution or quantile functions). Defaults to "I".

CCF09_dirs

a matrix of size c(n_proj, p) containing n_proj random directions (in Cartesian coordinates) on $S^{p-1}$ to perform the CCF09 test. If NULL (default), a sample of size n_proj = 50 directions is computed internally.

CJ12_beta

$\beta$ parameter in the exponential regime of CJ12 test, a positive real.

CJ12_reg

type of asymptotic regime for CJ12 test, either 1 (sub-exponential regime), 2 (exponential), or 3 (super-exponential; default).

cov_a

$a_n = a / n$ parameter used in the length of the arcs of the coverage-based tests. Must be positive. Defaults to 2 * pi.

Cressie_t

$t$ parameter for the Cressie test, a real in $(0, 1)$. Defaults to 1 / 3.

K_CCF09

integer giving the truncation of the series present in the asymptotic distribution of the Kolmogorov-Smirnov statistic. Defaults to 25.

Poisson_rho

$\rho$ parameter for the Poisson test, a real in $[0, 1)$. Defaults to 0.5.

Pycke_q

$q$ parameter for the Pycke "$q$-test", a real in $(0, 1)$. Defaults to 1 / 2.

Rayleigh_m

integer $m$ for the $m$-modal Rayleigh test. Defaults to m = 1 (the standard Rayleigh test).

Riesz_s

$s$ parameter for the $s$-Riesz test, a real in $(0, 2)$. Defaults to 1.

Rothman_t

$t$ parameter for the Rothman test, a real in $(0, 1)$. Defaults to 1 / 3.

Sobolev_vk2

weights for the finite Sobolev test. A non-negative vector or matrix. Defaults to c(0, 0, 1).

Softmax_kappa

$\kappa$ parameter for the Softmax test, a non-negative real. Defaults to 1.

Stein_K

truncation $K$ parameter for the Stein test, a positive integer. Defaults to 10.

Stein_cf

logical indicating whether to use the characteristic function in the Stein test. Defaults to FALSE (moment generating function).

Stereo_a

$a$ parameter for the Stereo test, a real in $[-1, 1]$. Defaults to 0.

...

If p_value = "MC" or p_value = "crit_val", optional performance parameters to be passed to unif_stat_MC: chunks, cores, and seed. If p_value = "asymp", additional parameters to unif_stat_distr.

Details

All the tests reject for large values of the test statistic, so the critical values for the significance levels alpha correspond to the alpha-upper quantiles of the null distribution of the test statistic.

When p_value = "asymp", tests that do not have an implemented or known asymptotic are omitted, and a warning is generated.

When p_value = "MC", it is possible to have a progress bar indicating the Monte Carlo simulation progress if unif_test is wrapped with progressr::with_progress or if progressr::handlers(global = TRUE) is invoked (once) by the user. See the examples below. The progress bar is updated with the number of finished chunks.

All the statistics are continuous random variables except the Hodges--Ajne statistic ("Hodges_Ajne"), the Cressie statistic ("Cressie"), and the number of (different) uncovered spacings ("Num_uncover"). These three statistics are discrete random variables.

The Monte Carlo calibration for the CCF09 test is made conditionally on the choice of
CCF09_dirs. That is, all the Monte Carlo statistics share the same random directions.

Except for CCF09_dirs, K_CCF09, and CJ12_reg, all the test-specific parameters are vectorized.

Descriptions and references for most of the tests are available in García-Portugués and Verdebout (2018).

References

García-Portugués, E. and Verdebout, T. (2018) An overview of uniformity tests on the hypersphere. arXiv:1804.00286. tools:::Rd_expr_doi("10.48550/arXiv.1804.00286")

Examples

Run this code

## Asymptotic distribution

# Circular data
n <- 10
samp_cir <- r_unif_cir(n = n)

# Matrix
unif_test(data = samp_cir, type = "Ajne", p_value = "asymp")

# Vector
unif_test(data = samp_cir[, 1], type = "Ajne", p_value = "asymp")

# Array
unif_test(data = array(samp_cir, dim = c(n, 1, 1)), type = "Ajne",
          p_value = "asymp")
# \donttest{
# Several tests
unif_test(data = samp_cir, type = avail_cir_tests, p_value = "asymp")
# }
# Spherical data
n <- 10
samp_sph <- r_unif_sph(n = n, p = 3)

# Array
unif_test(data = samp_sph, type = "Bingham", p_value = "asymp")

# Matrix
unif_test(data = samp_sph[, , 1], type = "Bingham", p_value = "asymp")
# \donttest{
# Several tests
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "asymp")

## Monte Carlo

# Circular data
unif_test(data = samp_cir, type = "Ajne", p_value = "MC")
unif_test(data = samp_cir, type = avail_cir_tests, p_value = "MC")

# Spherical data
unif_test(data = samp_sph, type = "Bingham", p_value = "MC")
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC")

# Caching stats_MC
stats_MC_cir <- unif_stat_MC(n = nrow(samp_cir), p = 2)$stats_MC
stats_MC_sph <- unif_stat_MC(n = nrow(samp_sph), p = 3)$stats_MC
unif_test(data = samp_cir, type = avail_cir_tests,
          p_value = "MC", stats_MC = stats_MC_cir)
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC",
          stats_MC = stats_MC_sph)

## Critical values

# Circular data
unif_test(data = samp_cir, type = avail_cir_tests, p_value = "crit_val")

# Spherical data
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "crit_val")

# Caching crit_val
crit_val_cir <- unif_stat_MC(n = n, p = 2)$crit_val_MC
crit_val_sph <- unif_stat_MC(n = n, p = 3)$crit_val_MC
unif_test(data = samp_cir, type = avail_cir_tests,
          p_value = "crit_val", crit_val = crit_val_cir)
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "crit_val",
          crit_val = crit_val_sph)

## Specific arguments

# Rothman
unif_test(data = samp_cir, type = "Rothman", Rothman_t = 0.5)

# CCF09
unif_test(data = samp_sph, type = "CCF09", p_value = "MC",
          CCF09_dirs = samp_sph[1:2, , 1])
unif_test(data = samp_sph, type = "CCF09", p_value = "MC",
          CCF09_dirs = samp_sph[3:4, , 1])

## Using a progress bar when p_value = "MC"

# Define a progress bar
require(progress)
require(progressr)
handlers(handler_progress(
  format = paste("(:spin) [:bar] :percent Iter: :current/:total Rate:",
                 ":tick_rate iter/sec ETA: :eta Elapsed: :elapsedfull"),
  clear = FALSE))

# Call unif_test() within with_progress()
with_progress(
  unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC",
            chunks = 10, M = 1e3)
)

# With several cores
with_progress(
  unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC",
            cores = 2, chunks = 10, M = 1e3)
)

# Instead of using with_progress() each time, it is more practical to run
# handlers(global = TRUE)
# once to activate progress bars in your R session
# }

Run the code above in your browser using DataLab