test_selective_inference: Selective inference for post-clustering variable involvement

Description

Selective inference for post-clustering variable involvement

Usage

test_selective_inference(
  X,
  k1,
  k2,
  g,
  ndraws = 2000,
  cl_fun,
  cl = NULL,
  sig = NULL
)

Value

A list with the following elements

stat_g : the test statistic used for the test.
pval : The resulting p-values of the test.
stder : The standard deviation of the p-values computed thanks to the Monte-Carlo samples.
clusters : The labels of the data.

Arguments

X: The data matrix of size on which the clustering is applied
k1: The first cluster of interest
k2: The second cluster of interest
g: The variables for which the test is applied
ndraws: The number of Monte-Carlo samples
cl_fun: The clustering function used to build clusters
cl: The labels of the data obtained thanks to the cl_fun function
sig: The estimated standard deviation. Default is NULL and the standard deviation is estimated using only observations in the two clusters of interest

References

Gao, L. L., Bien, J., & Witten, D. (2022). Selective inference for hierarchical clustering. Journal of the American Statistical Association, (just-accepted), 1-27.

Examples

Run this code

X <- matrix(rnorm(200),ncol = 2)
hcl_fun <- function(x){
return(as.factor(cutree(hclust(dist(x), method = "ward.D2"), k=2)))}
cl <- hcl_fun(X)
plot(X, col=cl)
#Note that in practice the value of ndraws (the number of Monte-Carlo simulations must be higher)
test_var1 <- test_selective_inference(X, k1=1, k2=2, g=1, ndraws =100, cl_fun = hcl_fun, cl = cl)

Run the code above in your browser using DataLab