Learn R Programming

semidist (version 0.1.0)

sd_test: Semi-distance independence test

Description

Implement the semi-distance independence test via permutation test, or via the asymptotic approximation when the dimensionality of continuous variables \(p\) is high.

Usage

sd_test(X, y, test_type = "perm", num_perm = 10000)

Value

A list with class "indtest" containing the following components

  • method: name of the test;

  • name_data: names of the X and y;

  • n: sample size of the data;

  • test_type: type of the test;

  • num_perm: number of replications in permutation test, if test_type = "perm";

  • stat: test statistic;

  • pvalue: computed p-value.

Arguments

X

Data of multivariate continuous variables, which should be an \(n\)-by-\(p\) matrix, or, a vector of length \(n\) (for univariate variable).

y

Data of categorical variables, which should be a factor of length \(n\).

test_type

Type of the test:

  • "perm" (the default): Implement the test via permutation test;

  • "asym": Implement the test via the asymptotic approximation when the dimension of continuous variables \(p\) is high.

See the Reference for details.

num_perm

The number of replications in permutation test. Defaults to 10000. See Details and Reference.

Details

The semi-distance independence test statistic is $$T_n = n \cdot \widetilde{\text{SDcov}}_n(X, y),$$ where the \(\widetilde{\text{SDcov}}_n(X, y)\) can be computed by sdcov(X, y, type = "U").

For the permutation test (test_type = "perm"), totally \(K\) replications of permutation will be conducted, and the argument num_perm specifies the \(K\) here. The p-value of permutation test is computed by $$\text{p-value} = (\sum_{k=1}^K I(T^{\ast (k)}_{n} \ge T_{n}) + 1) / (K + 1),$$ where \(T_{n}\) is the semi-distance test statistic and \(T^{\ast (k)}_{n}\) is the test statistic with \(k\)-th permutation sample.

When the dimension of the continuous variables is high, the asymptotic approximation approach can be applied (test_type = "asym"), which is computationally faster since no permutation is needed.

See Also

sdcov() for computing the statistic of semi-distance covariance.

Examples

Run this code
X <- mtcars[, c("mpg", "disp", "drat", "wt")]
y <- factor(mtcars[, "am"])
test <- sd_test(X, y)
print(test)

# Man-made independent data -------------------------------------------------
n <- 30; R <- 5; p <- 3; prob <- rep(1/R, R)
X <- matrix(rnorm(n*p), n, p)
y <- factor(sample(1:R, size = n, replace = TRUE, prob = prob), levels = 1:R)
test <- sd_test(X, y)
print(test)

# Man-made functionally dependent data --------------------------------------
n <- 30; R <- 3; p <- 3
X <- matrix(0, n, p)
X[1:10, 1] <- 1; X[11:20, 2] <- 1; X[21:30, 3] <- 1
y <- factor(rep(1:3, each = 10))
test <- sd_test(X, y)
print(test)

#' Man-made high-dimensionally independent data -----------------------------
n <- 30; R <- 3; p <- 100
X <- matrix(rnorm(n*p), n, p)
y <- factor(rep(1:3, each = 10))
test <- sd_test(X, y)
print(test)

test <- sd_test(X, y, test_type = "asym")
print(test)

# Man-made high-dimensionally dependent data --------------------------------
n <- 30; R <- 3; p <- 100
X <- matrix(0, n, p)
X[1:10, 1] <- 1; X[11:20, 2] <- 1; X[21:30, 3] <- 1
y <- factor(rep(1:3, each = 10))
test <- sd_test(X, y)
print(test)

test <- sd_test(X, y, test_type = "asym")
print(test)

Run the code above in your browser using DataLab