Learn R Programming

VALIDICLUST (version 0.1.0)

test_selective_inference: Selective inference for post-clustering variable involvement

Description

Selective inference for post-clustering variable involvement

Usage

test_selective_inference(
  X,
  k1,
  k2,
  g,
  ndraws = 2000,
  cl_fun,
  cl = NULL,
  sig = NULL
)

Value

A list with the following elements

  • stat_g : the test statistic used for the test.

  • pval : The resulting p-values of the test.

  • stder : The standard deviation of the p-values computed thanks to the Monte-Carlo samples.

  • clusters : The labels of the data.

Arguments

X

The data matrix of size on which the clustering is applied

k1

The first cluster of interest

k2

The second cluster of interest

g

The variables for which the test is applied

ndraws

The number of Monte-Carlo samples

cl_fun

The clustering function used to build clusters

cl

The labels of the data obtained thanks to the cl_fun function

sig

The estimated standard deviation. Default is NULL and the standard deviation is estimated using only observations in the two clusters of interest

References

Gao, L. L., Bien, J., & Witten, D. (2022). Selective inference for hierarchical clustering. Journal of the American Statistical Association, (just-accepted), 1-27.

Examples

Run this code
X <- matrix(rnorm(200),ncol = 2)
hcl_fun <- function(x){
return(as.factor(cutree(hclust(dist(x), method = "ward.D2"), k=2)))}
cl <- hcl_fun(X)
plot(X, col=cl)
#Note that in practice the value of ndraws (the number of Monte-Carlo simulations must be higher)
test_var1 <- test_selective_inference(X, k1=1, k2=2, g=1, ndraws =100, cl_fun = hcl_fun, cl = cl)



Run the code above in your browser using DataLab