ds_test: Hypothesis testing via dynamic slicing

Description

Perform a one- or K-sample ($K > 1$) hypothesis testing via dynamic slicing.

Usage

ds_test(y, x, ..., type = c("ds", "eqp"), lambda = 1, alpha = 1, rounds = 0)

Arguments

A numeric vector of data values.

Either an integer vector of data values, from 0 to $K-1$, or a character string naming a cumulative distribution function or an actual cumulative distribution function such as pnorm. Only continuous CDFs are valid.

...

Parameters of the distribution specified (as a character string) by x.

type

Methods applied for dynamic slicing. "ds" (default) stands for original dynamic slicing scheme. "eqp" stands for dynamic slicing scheme with $n^{1/2}$-resolution (for K-sample test, $K > 1$) or $n$-resolution (for one-sa

lambda

Penalty for introducing an additional slice, which is used to avoid making too many slices. It corresponds to the type I error under the scenario that the two variables are independent. lambda should be greater than 0.

alpha

Penalty required for "ds" type in one-sample test. It penalizes both the width and the number of slices to avoid too many slices and degenerate slice (interval). alpha should be greater than 1.

rounds

Number of permutations for estimating empirical p-value.

Value

A list with class "htest" containing the following components:
statisticThe value of the dynamic slicing statistic.
p.valueThe p-value of the test.
alternativeA character string describing the alternative hypothesis.
methodA character string indicating what type of test was performed.
data.nameA character string giving the name(s) of the data.
slicesSlicing strategy that maximize dynamic slicing statistic in K-sample test. Each row stands for a slice. Each column except the last one stands for the number of observations take each value in each slice. The last column is the number of observations in each slice i.e., the sum of the first column to the kth column.

Details

If x is an integer vector, ds_test performs K-sample test ($K > 1$). Under this scenario, suppose that there are observations y drawn from some continuous populations. Let x be a vector that stores values of indicator of samples from different populations, i.e., x has values $0, 1, \ldots, K-1$. The null hypothesis is that these populations have the same distribution. If x is a character string naming a continuous (cumulative) distribution function, ds_test performs one-sample test with the null hypothesis that the distribution function which generated y is distribution x with parameters specified by $\ldots$. The parameters specified in $\ldots$ must be pre-specified and not estimated from the data. Only empirical p-values are available by specifying the value of parameter rounds, the number of permutation. lambda and alpha (for one-sample test with type "ds") contributes to p-value. The procedure of choosing parameter lambda was described in Jiang, Ye & Liu (2014). Refer to http://www.people.fas.harvard.edu/~junliu/DS/lambda-table.html for the empirical relationship of lambda, sample size and type I error.

References

Jiang, B., Ye, C. and Liu, J.S. Non-parametric K-sample tests via dynamic slicing. Journal of the American Statistical Association, 2014.

Examples

Run this code

##  One-sample test
n <- 100
mu <- 0.5
y <- rnorm(n, mu, 1)
lambda <- 1.0
alpha <- 1.0
dsres <- ds_test(y, "pnorm", 0, 1, lambda = 1, alpha = 1, rounds = 100)
dsres <- ds_test(y, "pnorm", 0, 1, type = "ds", lambda = 1, alpha = 1)
dsres <- ds_test(y, "pnorm", 0, 1, type = "eqp", lambda = 1, rounds = 100)
dsres <- ds_test(y, "pnorm", 0, 1, type = "eqp", lambda = 1)

##  K-sample test
n <- 100
mu <- 0.5
y <- c(rnorm(n, -mu, 1), rnorm(n, mu, 1))

##  generate x in this way:
x <- c(rep(0, n), rep(1, n))
x <- as.integer(x)

##  or in this way:
x <- c(rep("G1", n), rep("G2", n))
x <- relabel(x)

lambda <- 1.0
dsres <- ds_test(y, x, lambda = 1, rounds = 100)
dsres <- ds_test(y, x, type = "eqp", lambda = 1, rounds = 100)

Run the code above in your browser using DataLab