np.cdf.test: Nonparametric Distribution Tests

Description

Peforms one- or two-sample nonparametric (randomization) tests of cumulative distribution functions. Implements Anderson-Darling, Cramer-von Mises, and Kolmogorov-Smirnov test statistics.

Usage

np.cdf.test(x, y = NULL,
            method = c("AD", "CVM", "KS"),
            R = 9999, parallel = FALSE, cl = NULL,
            perm.dist = TRUE, na.rm = TRUE)

Value

statistic: Test statistic value.
p.value: p-value for testing \(H_0: F_x = F_0\) or \(H_0: F_x = F_y\).
perm.dist: Permutation distribution of statistic.
method: Method used for permutation test. See Examples.
R: Number of resamples.
exact: Exact permutation test? See Note.

Arguments

x: Numeric vector (or matrix) of data values.
y: One-sample: name of distribution family with "p" and "r" components (see Note). Two-sample: numeric vector (or matrix) of data values.
method: Test statistic to use: AD = Anderson-Darling, CVM = Cramer-Von Mises, KS = Kolmogorov-Smirnov
R: Number of resamples for the permutation test (positive integer).
parallel: Logical indicating if the parallel package should be used for parallel computing (of the permutation distribution). Defaults to FALSE, which implements sequential computing.
cl: Cluster for parallel computing, which is used when parallel = TRUE. Note that if parallel = TRUE and cl = NULL, then the cluster is defined as makeCluster(2L) to use two cores. To make use of all available cores, use the code cl = makeCluster(detectCores()).
perm.dist: Logical indicating if the permutation distribution should be returned.
na.rm: If TRUE (default), the arguments x and groups (and blocks if provided) are passed to the na.omit function to remove cases with missing data.

Author

Nathaniel E. Helwig <helwig@umn.edu>

Details

One-sample statistics:

`AD`		\(\omega^2 = \int w(x) (F_n(x) - F_0(x))^2 d F_0(x)\) with \(w(x) = [F_0(x)(1 - F_0(x))]^{-1}\)
`CVM`		\(\omega^2 = \int w(x) (F_n(x) - F_0(x))^2 d F_0(x)\) with \(w(x) = 1\)
`KS`		\(\omega^2 = \sup_{x} (F_n(x) - F_0(x))^2\)

where \(F_n(x)\) is the empirical cumulative distribution function (estimated by ecdf) and \(F_0\) is the null hypothesized distribution (specified by the y argument).

Two-sample statistics:

`AD`		\(\omega^2 = \int w(z) (F_{x}(z) - F_{y}(z))^2 d F_0(z)\) with \(w(z) = [F_0(z)(1 - F_0(z))]^{-1}\)
`CVM`		\(\omega^2 = \int w(z) (F_{x}(z) - F_{y}(z))^2 d F_0(z)\) with \(w(z) = 1\)
`KS`		\(\omega^2 = \sup_{z} (F_{x}(z) - F_{y}(z))^2\)

where \(F_x\) and \(F_y\) are the groupwise ECDF functions (estimated by applying ecdf separately to x and y) and \(F_0\) is the joint ECDF (estimated by applying ecdf to z = c(x,y) ).

References

Anderson, T. W. and Darling., D. A. (1952). Asymptotic theory of certain "goodness of fit" criteria based on stochastic processes. Annals of Mathematical Statistics, 23(2), 193-212. tools:::Rd_expr_doi("10.1214/aoms/1177729437")

Anderson, T. W., and Darling, D. A. (1954). A test of goodness of fit. Journal of the American Statistical Association, 49(268), 765-769. tools:::Rd_expr_doi("10.1080/01621459.1954.10501232")

Anderson, T. W. (1962). On the distribution of the two-sample Cramer-von Mises criterion. Annals of Mathematical Statistics, 33(3) 1148-1159. tools:::Rd_expr_doi("10.1214/aoms/1177704477")

Cramer, H. (1928). On the composition of elementary errors: First paper: Mathematical deductions. Scandinavian Actuarial Journal, 1928(1), 13-74. tools:::Rd_expr_doi("10.1080/03461238.1928.10416862")

Kolmogorov, A. N. (1933). Sulla determinazione empirica di una legge di distribuzione. Giornale dell'Istituto Italiano degli Attuari 4, 83-91.

Kolmogorov, A. N. (1941). Confidence limits for an unknown distribution function. Annals of Mathematical Statistics 12(4), 461-483. tools:::Rd_expr_doi("10.1214/aoms/1177731684")

Smirnov, N. (1948). Table for estimating the goodness of fit of empirical distributions. Annals of Mathematical Statistics 19(2) 279-281. tools:::Rd_expr_doi("10.1214/aoms/1177730256")

von Mises, R. (1928). Wahrscheinlichkeit, Statistik und Wahrheit. Julius Springer.

Examples

Run this code


###***###   ONE SAMPLE   ###***###

## generate standard normal data
n <- 100
set.seed(0)
x <- rnorm(n)


## Example 1: Fn = norm,  F0 = norm

# Anderson-Darling test of H0: Fx = pnorm
set.seed(1)
np.cdf.test(x, y = "norm")

if (FALSE) {

# Cramer-von Mises test of H0: Fx = pnorm
set.seed(1)
np.cdf.test(x, y = "norm", method = "CVM")

# Kolmogorov-Smirnov test of H0: Fx = pnorm
set.seed(1)
np.cdf.test(x, y = "norm", method = "KS")


## Example 2: Fn = norm,  F0 = t3

# user-defined distribution (Student's t with df = 3)
pt3 <- function(q) pt(q, df = 3)      # cdf = paste("p", y)
rt3 <- function(n) rt(n, df = 3)      # sim = paste("r", y)

# Anderson-Darling test of H0: Fx = t3
set.seed(1)
np.cdf.test(x, y = "t3")

# Cramer-von Mises test of H0: Fx = t3
set.seed(1)
np.cdf.test(x, y = "t3", method = "CVM")

# Kolmogorov-Smirnov test of H0: Fx = t3
set.seed(1)
np.cdf.test(x, y = "t3", method = "KS")



###***###   TWO SAMPLE   ###***###

# generate N(0, 1) and N(2/3, 1) data
m <- 25
n <- 25
set.seed(0)
x <- rnorm(m)
y <- rnorm(n, mean = 2/3)

# Anderson-Darling test of H0: Fx = Fy
set.seed(1)
np.cdf.test(x, y)

# Cramer-von Mises test of H0: Fx = Fy
set.seed(1)
np.cdf.test(x, y, method = "CVM")

# Kolmogorov-Smirnov test of H0: Fx = Fy
set.seed(1)
np.cdf.test(x, y, method = "KS")
}

Run the code above in your browser using DataLab