
Last chance! 50% off unlimited learning
Sale ends in
Performs the distribution free exact k-sample test for equality of multivariate distributions in the HDLSS regime. This an aggregate test of the two sample versions of the RI test over
ARItest(M, sizes, randomization = TRUE, clust_alg = "knwClustNo", kmax = 4,
multTest = "Holm", s_psi = 1, s_h = 1, lb = 1, n_sts = 1000, alpha = 0.05)
vector of sample sizes
logical; if TRUE (default), randomization test and FALSE, non-randomization test
"knwClustNo"
(default) or "estclustNo"
; modified K-means algorithm used for clustering
maximum value of total number of clusters to estimate total number of clusters for two-sample comparition, default: 4
"HOlm"
(default) or "BenHoch"
; different multiple tests
function required for clustering, 1 for
function required for clustering, 1 for
each observation is partitioned into some numbers of smaller vectors of same length
number of simulation of the test statistic, default:
numeric, confidence level
ARItest returns a list containing the following items:
value of the observed test statistic
cut-off of the test
randomized coefficient of the test
if returns
indicates where two populations are different according to multiple tests
Biplab Paul, Shyamal K De and Anil K Ghosh (2021). Some clustering based exact distribution-free k-sample tests applicable to high dimension, low sample size data, Journal of Multivariate Analysis, doi:10.1016/j.jmva.2021.104897.
William M Rand (1971). Objective criteria for the evaluation of clustering methods, Journal of the American Statistical association, 66(336):846-850, doi:10.1080/01621459.1971.10482356.
Sture Holm (1979). A simple sequentially rejective multiple test procedure, Scandinavian journal of statistics, 65-70, doi:10.2307/4615733.
Yoav Benjamini and Yosef Hochberg (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal statistical society: series B (Methodological) 57.1: 289-300, doi: 10.2307/2346101.
# NOT RUN {
# muiltivariate normal distribution:
# generate data with dimension d = 500
set.seed(151)
n1=n2=n3=n4=10
d = 500
I1 <- matrix(rnorm(n1*d,mean=0,sd=1),n1,d)
I2 <- matrix(rnorm(n2*d,mean=0.5,sd=1),n2,d)
I3 <- matrix(rnorm(n3*d,mean=1,sd=1),n3,d)
I4 <- matrix(rnorm(n4*d,mean=1.5,sd=1),n4,d)
X <- as.matrix(rbind(I1,I2,I3,I4))
#ARI test:
results <- ARItest(M=X, sizes = c(n1,n2,n3,n4))
## outputs:
results$ARIStat
#[1] 0
results$ARICutoff
#[1] 0.3368421
results$randomGamma
#[1] 0
results$decisionARI
#[1] 1
results$multipleTest
# Population.1 Population.2 rejected pvalues
#1 1 2 TRUE 0
#2 1 3 TRUE 0
#3 1 4 TRUE 0
#4 2 3 TRUE 0
#5 2 4 TRUE 0
#6 3 4 TRUE 0
# }
Run the code above in your browser using DataLab