sd_test: Semi-distance independence test

Description

Implement the semi-distance independence test via permutation test, or via the asymptotic approximation when the dimensionality of continuous variables $p$ is high.

Usage

sd_test(X, y, test_type = "perm", num_perm = 10000)

Value

A list with class "indtest" containing the following components

method: name of the test;
name_data: names of the X and y;
n: sample size of the data;
test_type: type of the test;
num_perm: number of replications in permutation test, if test_type = "perm";
stat: test statistic;
pvalue: computed p-value.

Arguments

X

Data of multivariate continuous variables, which should be an $n$-by-$p$ matrix, or, a vector of length $n$ (for univariate variable).

y

Data of categorical variables, which should be a factor of length $n$.

test_type

Type of the test:

"perm" (the default): Implement the test via permutation test;
"asym": Implement the test via the asymptotic approximation when the dimension of continuous variables $p$ is high.

See the Reference for details.

num_perm

The number of replications in permutation test. Defaults to 10000. See Details and Reference.

Details

The semi-distance independence test statistic is $$T_n = n \cdot \widetilde{\text{SDcov}}_n(X, y),$$ where the $\widetilde{\text{SDcov}}_n(X, y)$ can be computed by sdcov(X, y, type = "U").

For the permutation test (test_type = "perm"), totally $K$ replications of permutation will be conducted, and the argument num_perm specifies the $K$ here. The p-value of permutation test is computed by $$\text{p-value} = (\sum_{k=1}^K I(T^{\ast (k)}_{n} \ge T_{n}) + 1) / (K + 1),$$ where $T_{n}$ is the semi-distance test statistic and $T^{\ast (k)}_{n}$ is the test statistic with $k$-th permutation sample.

When the dimension of the continuous variables is high, the asymptotic approximation approach can be applied (test_type = "asym"), which is computationally faster since no permutation is needed.

Examples

Run this code

X <- mtcars[, c("mpg", "disp", "drat", "wt")]
y <- factor(mtcars[, "am"])
test <- sd_test(X, y)
print(test)

# Man-made independent data -------------------------------------------------
n <- 30; R <- 5; p <- 3; prob <- rep(1/R, R)
X <- matrix(rnorm(n*p), n, p)
y <- factor(sample(1:R, size = n, replace = TRUE, prob = prob), levels = 1:R)
test <- sd_test(X, y)
print(test)

# Man-made functionally dependent data --------------------------------------
n <- 30; R <- 3; p <- 3
X <- matrix(0, n, p)
X[1:10, 1] <- 1; X[11:20, 2] <- 1; X[21:30, 3] <- 1
y <- factor(rep(1:3, each = 10))
test <- sd_test(X, y)
print(test)

#' Man-made high-dimensionally independent data -----------------------------
n <- 30; R <- 3; p <- 100
X <- matrix(rnorm(n*p), n, p)
y <- factor(rep(1:3, each = 10))
test <- sd_test(X, y)
print(test)

test <- sd_test(X, y, test_type = "asym")
print(test)

# Man-made high-dimensionally dependent data --------------------------------
n <- 30; R <- 3; p <- 100
X <- matrix(0, n, p)
X[1:10, 1] <- 1; X[11:20, 2] <- 1; X[21:30, 3] <- 1
y <- factor(rep(1:3, each = 10))
test <- sd_test(X, y)
print(test)

test <- sd_test(X, y, test_type = "asym")
print(test)