ARItest: k-Sample ARI Test of Equal Distributions

Description

Performs the distribution free exact k-sample test for equality of multivariate distributions in the HDLSS regime. This an aggregate test of the two sample versions of the RI test over $\frac{k (k - 1)}{2}$ numbers of two-sample comparisons, and the test statistic is the minimum of these two sample RI test statistics. Holm's step-down-procedure (1979) and Benjamini-Hochberg procedure (1995) are applied for multiple testing.

Usage

ARItest(M, sizes, randomization = TRUE, clust_alg = "knwClustNo", kmax = 4, 
multTest = "Holm", s_psi = 1, s_h = 1, lb = 1, n_sts = 1000, alpha = 0.05)

Arguments

$n \times d$ observations matrix of pooled sample, the observations should be grouped by their respective classes

sizes

vector of sample sizes

randomization

logical; if TRUE (default), randomization test and FALSE, non-randomization test

clust_alg

"knwClustNo"(default) or "estclustNo"; modified K-means algorithm used for clustering

kmax

maximum value of total number of clusters to estimate total number of clusters for two-sample comparition, default: 4

multTest

"HOlm"(default) or "BenHoch"; different multiple tests

s_psi

function required for clustering, 1 for $t^{2}$ , 2 for $1 - \exp (- t)$ , 3 for $1 - \exp (- t^{2})$ , 4 for $\log (1 + t)$ , 5 for $t$

s_h

function required for clustering, 1 for $\sqrt{t}$ , 2 for $t$

each observation is partitioned into some numbers of smaller vectors of same length $l b$ , default: $1$

n_sts

number of simulation of the test statistic, default: $1000$

alpha

numeric, confidence level $α$ , default: $0.05$

Value

ARItest returns a list containing the following items:

ARIStat

value of the observed test statistic

Cutoff

cut-off of the test

randomGamma

randomized coefficient of the test

decisionARI

if returns $1$ , reject the null hypothesis and if returns $0$ , fails to reject the null hypothesis

multipleTest

indicates where two populations are different according to multiple tests

References

Biplab Paul, Shyamal K De and Anil K Ghosh (2021). Some clustering based exact distribution-free k-sample tests applicable to high dimension, low sample size data, Journal of Multivariate Analysis, doi:10.1016/j.jmva.2021.104897.

William M Rand (1971). Objective criteria for the evaluation of clustering methods, Journal of the American Statistical association, 66(336):846-850, doi:10.1080/01621459.1971.10482356.

Sture Holm (1979). A simple sequentially rejective multiple test procedure, Scandinavian journal of statistics, 65-70, doi:10.2307/4615733.

Yoav Benjamini and Yosef Hochberg (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal statistical society: series B (Methodological) 57.1: 289-300, doi: 10.2307/2346101.

Examples

Run this code

# NOT RUN {
  # muiltivariate normal distribution:
  # generate data with dimension d = 500
  set.seed(151)
  n1=n2=n3=n4=10
  d = 500
  I1 <- matrix(rnorm(n1*d,mean=0,sd=1),n1,d)
  I2 <- matrix(rnorm(n2*d,mean=0.5,sd=1),n2,d) 
  I3 <- matrix(rnorm(n3*d,mean=1,sd=1),n3,d) 
  I4 <- matrix(rnorm(n4*d,mean=1.5,sd=1),n4,d) 
  X <- as.matrix(rbind(I1,I2,I3,I4)) 
  #ARI test:
  results <- ARItest(M=X, sizes = c(n1,n2,n3,n4))
  
   ## outputs:
   results$ARIStat
   #[1] 0

   results$ARICutoff
   #[1] 0.3368421

   results$randomGamma
   #[1] 0

   results$decisionARI
   #[1] 1

   results$multipleTest
   #  Population.1 Population.2 rejected pvalues
   #1            1            2     TRUE       0
   #2            1            3     TRUE       0
   #3            1            4     TRUE       0
   #4            2            3     TRUE       0
   #5            2            4     TRUE       0
   #6            3            4     TRUE       0

# }