Learn R Programming

nptest (version 1.2)

np.cdf.test: Nonparametric Distribution Tests

Description

Peforms one- or two-sample nonparametric (randomization) tests of cumulative distribution functions. Implements Anderson-Darling, Cramer-von Mises, and Kolmogorov-Smirnov test statistics.

Usage

np.cdf.test(x, y = NULL,
            method = c("AD", "CVM", "KS"),
            R = 9999, parallel = FALSE, cl = NULL,
            perm.dist = TRUE, na.rm = TRUE)

Value

statistic

Test statistic value.

p.value

p-value for testing \(H_0: F_x = F_0\) or \(H_0: F_x = F_y\).

perm.dist

Permutation distribution of statistic.

method

Method used for permutation test. See Examples.

R

Number of resamples.

exact

Exact permutation test? See Note.

Arguments

x

Numeric vector (or matrix) of data values.

y

One-sample: name of distribution family with "p" and "r" components (see Note). Two-sample: numeric vector (or matrix) of data values.

method

Test statistic to use: AD = Anderson-Darling, CVM = Cramer-Von Mises, KS = Kolmogorov-Smirnov

R

Number of resamples for the permutation test (positive integer).

parallel

Logical indicating if the parallel package should be used for parallel computing (of the permutation distribution). Defaults to FALSE, which implements sequential computing.

cl

Cluster for parallel computing, which is used when parallel = TRUE. Note that if parallel = TRUE and cl = NULL, then the cluster is defined as makeCluster(2L) to use two cores. To make use of all available cores, use the code cl = makeCluster(detectCores()).

perm.dist

Logical indicating if the permutation distribution should be returned.

na.rm

If TRUE (default), the arguments x and groups (and blocks if provided) are passed to the na.omit function to remove cases with missing data.

Author

Nathaniel E. Helwig <helwig@umn.edu>

Details

One-sample statistics:

AD\(\omega^2 = \int w(x) (F_n(x) - F_0(x))^2 d F_0(x)\) with \(w(x) = [F_0(x)(1 - F_0(x))]^{-1}\)
CVM\(\omega^2 = \int w(x) (F_n(x) - F_0(x))^2 d F_0(x)\) with \(w(x) = 1\)
KS\(\omega^2 = \sup_{x} (F_n(x) - F_0(x))^2\)

where \(F_n(x)\) is the empirical cumulative distribution function (estimated by ecdf) and \(F_0\) is the null hypothesized distribution (specified by the y argument).

Two-sample statistics:

AD\(\omega^2 = \int w(z) (F_{x}(z) - F_{y}(z))^2 d F_0(z)\) with \(w(z) = [F_0(z)(1 - F_0(z))]^{-1}\)
CVM\(\omega^2 = \int w(z) (F_{x}(z) - F_{y}(z))^2 d F_0(z)\) with \(w(z) = 1\)
KS\(\omega^2 = \sup_{z} (F_{x}(z) - F_{y}(z))^2\)

where \(F_x\) and \(F_y\) are the groupwise ECDF functions (estimated by applying ecdf separately to x and y) and \(F_0\) is the joint ECDF (estimated by applying ecdf to z = c(x,y) ).

References

Anderson, T. W. and Darling., D. A. (1952). Asymptotic theory of certain "goodness of fit" criteria based on stochastic processes. Annals of Mathematical Statistics, 23(2), 193-212. tools:::Rd_expr_doi("10.1214/aoms/1177729437")

Anderson, T. W., and Darling, D. A. (1954). A test of goodness of fit. Journal of the American Statistical Association, 49(268), 765-769. tools:::Rd_expr_doi("10.1080/01621459.1954.10501232")

Anderson, T. W. (1962). On the distribution of the two-sample Cramer-von Mises criterion. Annals of Mathematical Statistics, 33(3) 1148-1159. tools:::Rd_expr_doi("10.1214/aoms/1177704477")

Cramer, H. (1928). On the composition of elementary errors: First paper: Mathematical deductions. Scandinavian Actuarial Journal, 1928(1), 13-74. tools:::Rd_expr_doi("10.1080/03461238.1928.10416862")

Kolmogorov, A. N. (1933). Sulla determinazione empirica di una legge di distribuzione. Giornale dell'Istituto Italiano degli Attuari 4, 83-91.

Kolmogorov, A. N. (1941). Confidence limits for an unknown distribution function. Annals of Mathematical Statistics 12(4), 461-483. tools:::Rd_expr_doi("10.1214/aoms/1177731684")

Smirnov, N. (1948). Table for estimating the goodness of fit of empirical distributions. Annals of Mathematical Statistics 19(2) 279-281. tools:::Rd_expr_doi("10.1214/aoms/1177730256")

von Mises, R. (1928). Wahrscheinlichkeit, Statistik und Wahrheit. Julius Springer.

See Also

plot.np.cdf.test S3 plotting method for visualizing the results

Examples

Run this code

###***###   ONE SAMPLE   ###***###

## generate standard normal data
n <- 100
set.seed(0)
x <- rnorm(n)


## Example 1: Fn = norm,  F0 = norm

# Anderson-Darling test of H0: Fx = pnorm
set.seed(1)
np.cdf.test(x, y = "norm")

if (FALSE) {

# Cramer-von Mises test of H0: Fx = pnorm
set.seed(1)
np.cdf.test(x, y = "norm", method = "CVM")

# Kolmogorov-Smirnov test of H0: Fx = pnorm
set.seed(1)
np.cdf.test(x, y = "norm", method = "KS")


## Example 2: Fn = norm,  F0 = t3

# user-defined distribution (Student's t with df = 3)
pt3 <- function(q) pt(q, df = 3)      # cdf = paste("p", y)
rt3 <- function(n) rt(n, df = 3)      # sim = paste("r", y)

# Anderson-Darling test of H0: Fx = t3
set.seed(1)
np.cdf.test(x, y = "t3")

# Cramer-von Mises test of H0: Fx = t3
set.seed(1)
np.cdf.test(x, y = "t3", method = "CVM")

# Kolmogorov-Smirnov test of H0: Fx = t3
set.seed(1)
np.cdf.test(x, y = "t3", method = "KS")



###***###   TWO SAMPLE   ###***###

# generate N(0, 1) and N(2/3, 1) data
m <- 25
n <- 25
set.seed(0)
x <- rnorm(m)
y <- rnorm(n, mean = 2/3)

# Anderson-Darling test of H0: Fx = Fy
set.seed(1)
np.cdf.test(x, y)

# Cramer-von Mises test of H0: Fx = Fy
set.seed(1)
np.cdf.test(x, y, method = "CVM")

# Kolmogorov-Smirnov test of H0: Fx = Fy
set.seed(1)
np.cdf.test(x, y, method = "KS")
}

Run the code above in your browser using DataLab