perform_independence_test: Perform a hypothesis test of independence

Description

Perform a hypothesis test of statistical independence by means of bootstrapping. The null hypothesis is that of independence between the two random variables, versus the alternative of dependence between them. This procedure gives a total of 8 combinations of bootstrap resampling schemes (nonparametric and independent), test statistics (centered and equivalent), and Kolmogorov-Smirnov or L2-type of true test statistic. This function gives the corresponding p-values, the true test statistic and the bootstrap-version test statistics. The default (and valid) method implemented in this function is the null bootstrap, together with the equivalent test statistic and Kolmogorov-Smirnov test statistic. Via the bootstrapOptions argument, the user can specify other bootstrap resampling schemes and test statistics.

Usage

perform_independence_test(
  X1,
  X2,
  my_grid = NULL,
  nBootstrap = 100,
  show_progress = TRUE,
  bootstrapOptions = NULL
)

Value

A class object with components

pvals_df: a dataframe of p-values and bootstrapped test statistics:

These are the p-values for the 8 combinations of bootstrap resampling schemes (nonparametric and independent), test statistics (centered and equivalent), and Kolmogorov-Smirnov or L2-type of true test statistic.

It also contains the vectors of bootstrap test statistics for each of the combinations.
true_stats a named vector of size 2 containing the true test statistics for the L2 and KS distances.
nBootstrap Number of bootstrap repetitions.
nameMethod string for the name of the method used.

Arguments

X1, X2

numerical vectors of the same size. The independence test tests whether X1 is independent from X2.

my_grid

the grid on which the CDFs are estimated. This must be one of

NULL: a regularly spaced grid from the minimum value to the maximum value of each variable with 20 points is used. This is the default.
A numeric of size 1. This is used at the length of both grids, replacing 20 in the above explanation.
A numeric vector of size larger than 1. This is directly used as the grid for both variables.
A list of two numeric vectors, which are used as the grids for both variables X1 and X2 respectively.

nBootstrap

number of bootstrap repetitions.

show_progress

logical value indicating whether to show a progress bar

bootstrapOptions

This can be one of

NULL This uses the default options type_boot = "indep", type_stat = "eq" and type_norm = "KS".
a list with at most 3 elements names
- type_boot type of bootstrap resampling scheme. It must be one of
  - "indep" for the independence bootstrap (i.e. under the null). This is the default.
  - "NP" for the non-parametric bootstrap (i.e. n out of n bootstrap).
- type_stat type of test statistic to be used. It must be one of
  - "eq" for the equivalent test statistic $$T_n^* = \sqrt{n} || \hat{F}_{(X,Y)}^* - \hat{F}_{X}^* \hat{F}_{Y}^* ||$$
  - "cent" for the centered test statistic $$T_n^* = \sqrt{n} || \hat{F}_{(X,Y)}^* - \hat{F}_{X}^* \hat{F}_{Y}^* - (\hat{F}_{(X,Y)} - \hat{F}_{X} \hat{F}_{Y}) ||$$
  For each type_boot there is only one valid choice of type_stat to be made. If type_stat is not specified, the valid choice is automatically used.
- type_norm type of norm to be used for the test statistic. It must be one of
  - "KS" for the Kolmogorov-Smirnov type test statistic. This is the default. It is given as $$ T_n = \sqrt{n} \sup_{(x, y) \in \mathbb{R}\rule{0pt}{0.6em}^{p+q}} \big| \hat{F}_{(X,Y),n}(x , y) - \hat{F}_{X,n}(x) \hat{F}_{Y,n}(y) \big| $$
  - "L2" for the squared L2-norm test statistic. $$ T_n = \sqrt{n}\int_{(x, y) \in \mathbb{R}\rule{0pt}{0.6em}^{p+q}} \big( \hat{F}_{(X,Y),n}(x , y) - \hat{F}_{X,n}(x) \hat{F}_{Y,n}(y) \big)^2 \mathrm{d}x\mathrm{d}y $$
"all" this gives test results for all theoretically valid combinations of bootstrap resampling schemes.
"all and also invalid" this gives test results for all possible combinations of bootstrap resampling schemes and test statistics, including invalid ones.

A warning is raised if the given combination of type_boot_user and type_stat_user is theoretically invalid.

References

Derumigny, A., Galanis, M., Schipper, W., & van der Vaart, A. (2025). Bootstrapping not under the null? ArXiv preprint, tools:::Rd_expr_doi("10.48550/arXiv.2512.10546")

Examples

Run this code

n <- 100

# Under H1
X1 <- rnorm(n)
X2 <- X1 + rnorm(n)
result <- perform_independence_test(
   X1, X2, nBootstrap = 50,
   bootstrapOptions = list(type_boot = "indep",
                           type_stat = "eq",
                           type_norm = "KS") )
print(result)
plot(result)

# Under H0
X1 <- rnorm(n)
X2 <- rnorm(n)
result <- perform_independence_test(X1, X2, nBootstrap = 50)
print(result)
plot(result)