Learn R Programming

HDLSSkST (version 2.1.0)

MTFStest: k-Sample MTFS Test of Equal Distributions

Description

Performs the distribution free exact k-sample test for equality of multivariate distributions in the HDLSS regime. This test is a multiscale approach based on FS test, where the results for different number of partitions are aggregated judiciously.

Usage

MTFStest(M, labels, sizes, k_max, multTest = "Holm", s_psi = 1, s_h = 1,
lb = 1, n_sts = 1000, alpha = 0.05)

Arguments

M

\(n\times d\) observations matrix of pooled sample, the observations should be grouped by their respective classes

labels

length \(n\) vector of membership index of observations

sizes

vector of sample sizes

k_max

maximum value of total number of clusters which is required for the test

multTest

"HOlm"(default) or "BenHoch"; different multiple tests

s_psi

function required for clustering, 1 for \(t^2\), 2 for \(1-\exp(-t)\), 3 for \(1-\exp(-t^2)\), 4 for \(\log(1+t)\), 5 for \(t\)

s_h

function required for clustering, 1 for \(\sqrt t\), 2 for \(t\)

lb

each observation is partitioned into some numbers of smaller vectors of same length \(lb\), default: \(1\)

n_sts

number of simulation of the test statistic, default: \(1000\)

alpha

numeric, confidence level \(\alpha\), default: \(0.05\)

Value

MTFStest returns a list containing the following items:

RIvec

a vector of the Rand indices based on different number of clusters

Pvalues

a vector of FS test p-values based on different number of clusters

decisionMTRI

if returns \(1\), reject the null hypothesis and if returns \(0\), fails to reject the null hypothesis

contTabs

a list of the observed contingency table based on different number of clusters

mulTestdec

a vector of \(0\)s and \(1\)s. \(0\): fails to reject the corresponding hypothesis and \(1\): reject the corresponding hypothesis

References

Biplab Paul, Shyamal K De and Anil K Ghosh (2021). Some clustering based exact distribution-free k-sample tests applicable to high dimension, low sample size data, Journal of Multivariate Analysis, doi:10.1016/j.jmva.2021.104897.

Sture Holm (1979). A simple sequentially rejective multiple test procedure, Scandinavian journal of statistics, 65-70, doi:10.2307/4615733.

Yoav Benjamini and Yosef Hochberg (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal statistical society: series B (Methodological) 57.1: 289-300, doi: 10.2307/2346101.

Examples

Run this code
# NOT RUN {
  # muiltivariate normal distribution:
  # generate data with dimension d = 500
  set.seed(151)
  n1=n2=n3=n4=10
  d = 500
  I1 <- matrix(rnorm(n1*d,mean=0,sd=1),n1,d)
  I2 <- matrix(rnorm(n2*d,mean=0.5,sd=1),n2,d) 
  I3 <- matrix(rnorm(n3*d,mean=1,sd=1),n3,d) 
  I4 <- matrix(rnorm(n4*d,mean=1.5,sd=1),n4,d)
  levels <- c(rep(0,n1), rep(1,n2), rep(2,n3), rep(3,n4)) 
  X <- as.matrix(rbind(I1,I2,I3,I4)) 
  #MTFS test:
  results <- MTFStest(X, levels, c(n1,n2,n3,n4), 8)
  
   ## outputs:
   results$fpmfvec
   #[1] 7.254445e-12 6.137740e-16 2.125236e-22 2.125236e-22 2.125236e-22 2.125236e-22 2.125236e-22

   results$Pvalues
   #[1] 0 0 0 0 0 0 0

   results$decisionMTFS
   #[1] 1

   results$contTabs
   #$contTabs[[1]]
   #     [,1] [,2]
   #[1,]   10    0
   #[2,]   10    0
   #[3,]    0   10
   #[4,]    0   10

   #$contTabs[[2]]
   #    [,1] [,2] [,3]
   #[1,]   10    0    0
   #[2,]    0   10    0
   #[3,]    0    8    2
   #[4,]    0    0   10

   #$contTabs[[3]]
   #     [,1] [,2] [,3] [,4]
   #[1,]   10    0    0    0
   #[2,]    0   10    0    0
   #[3,]    0    0   10    0
   #[4,]    0    0    0   10

   #$contTabs[[4]]
   #     [,1] [,2] [,3] [,4] [,5]
   #[1,]   10    0    0    0    0
   #[2,]    0   10    0    0    0
   #[3,]    0    0    4    6    0
   #[4,]    0    0    0    0   10

   #$contTabs[[5]]
   #    [,1] [,2] [,3] [,4] [,5] [,6]
   #[1,]   10    0    0    0    0    0
   #[2,]    0   10    0    0    0    0
   #[3,]    0    0    4    6    0    0
   #[4,]    0    0    0    0    8    2

   #$contTabs[[6]]
   #     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
   #[1,]   10    0    0    0    0    0    0
   #[2,]    0    5    5    0    0    0    0
   #[3,]    0    0    0    4    6    0    0
   #[4,]    0    0    0    0    0    8    2

   #$contTabs[[7]]
   #     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
   #[1,]    8    2    0    0    0    0    0    0
   #[2,]    0    0    5    5    0    0    0    0
   #[3,]    0    0    0    0    4    6    0    0
   #[4,]    0    0    0    0    0    0    8    2


   results$mulTestdec
   #[1] 1 1 1 1 1 1 1
# }

Run the code above in your browser using DataLab