Unlimited learning, half price | 50% off

Last chance! 50% off unlimited learning

Sale ends in


KSgeneral (version 2.0.2)

cont_ks_cdf: Computes the cumulative distribution function of the two-sided Kolmogorov-Smirnov statistic when the cdf under the null hypothesis is continuous

Description

Computes the cdf P(Dnq)P(Dn<q) at a fixed q, q[0,1], for the one-sample two-sided Kolmogorov-Smirnov statistic, Dn, for a given sample size n, when the cdf F(x) under the null hypothesis is continuous.

Usage

cont_ks_cdf(q, n)

Value

Numeric value corresponding to P(Dnq).

Arguments

q

numeric value between 0 and 1, at which the cdf P(Dnq) is computed

n

the sample size

Details

Given a random sample {X1,...,Xn} of size n with an empirical cdf Fn(x), the Kolmogorov-Smirnov goodness-of-fit statistic is defined as Dn=sup|Fn(x)F(x)|, where F(x) is the cdf of a prespecified theoretical distribution under the null hypothesis H0, that {X1,...,Xn} comes from F(x).

The function cont_ks_cdf implements the FFT-based algorithm proposed by Moscovich and Nadler (2017) to compute the cdf P(Dnq) at a value q, when F(x) is continuous. This algorithm ensures a total worst-case run-time of order O(n2log(n)) which makes it more efficient and numerically stable than the algorithm proposed by Marsaglia et al. (2003). The latter is used by many existing packages computing the cdf of Dn, e.g., the function ks.test in the package stats and the function ks.test in the package dgof. More precisely, in these packages, the exact p-value, P(Dnq) is computed only in the case when q=dn, where dn is the value of the KS statistic computed based on a user provided sample {x1,...,xn}. Another limitation of the functions ks.test is that the sample size should be less than 100, and the computation time is O(n3). In contrast, the function cont_ks_cdf provides results with at least 10 correct digits after the decimal point for sample sizes n up to 100000 and computation time of 16 seconds on a machine with an 2.5GHz Intel Core i5 processor with 4GB RAM, running MacOS X Yosemite. For n > 100000, accurate results can still be computed with similar accuracy, but at a higher computation time. See Dimitrova, Kaishev, Tan (2020), Appendix B for further details and examples.

References

Dimitrina S. Dimitrova, Vladimir K. Kaishev, Senren Tan. (2020) "Computing the Kolmogorov-Smirnov Distribution When the Underlying CDF is Purely Discrete, Mixed or Continuous". Journal of Statistical Software, 95(10): 1-42. doi:10.18637/jss.v095.i10.

Marsaglia G., Tsang WW., Wang J. (2003). "Evaluating Kolmogorov's Distribution". Journal of Statistical Software, 8(18), 1-4.

Moscovich A., Nadler B. (2017). "Fast Calculation of Boundary Crossing Probabilities for Poisson Processes". Statistics and Probability Letters, 123, 177-182.

Examples

Run this code
## Compute the value for P(D_{100} <= 0.05)

KSgeneral::cont_ks_cdf(0.05, 100)


## Compute P(D_{n} <= q)
## for n = 100, q = 1/500, 2/500, ..., 500/500
## and then plot the corresponding values against q

n<-100
q<-1:500/500
plot(q, sapply(q, function(x) KSgeneral::cont_ks_cdf(x, n)), type='l')

## Compute P(D_{n} <= q) for n = 40, nq^{2} = 0.76 as shown
## in Table 9 of Dimitrova, Kaishev, Tan (2020)

KSgeneral::cont_ks_cdf(sqrt(0.76/40), 40)

Run the code above in your browser using DataLab