Computes the p-value \(P(D_{n} \ge d_{n}) \equiv P(D_{n} > d_{n})\), where \(d_{n}\) is the value of the KS test statistic computed based on a data sample \(\{x_{1}, ..., x_{n}\}\), when \(F(x)\) is continuous.
cont_ks_test(x, y, ...)
a numeric vector of data sample values \(\{x_{1}, ..., x_{n}\}\).
values of the parameters of the cdf, \(F(x)\) specified (as a character string) by y
.
A list with class "htest" containing the following components:
the value of the statistic.
the p-value of the test.
"two-sided".
a character string giving the name of the data.
Given a random sample \(\{X_{1}, ..., X_{n}\}\) of size n
with an empirical cdf \(F_{n}(x)\), the two-sided Kolmogorov-Smirnov goodness-of-fit statistic is defined as \(D_{n} = \sup | F_{n}(x) - F(x) | \), where \(F(x)\) is the cdf of a prespecified theoretical distribution under the null hypothesis \(H_{0}\), that \(\{X_{1}, ..., X_{n}\}\) comes from \(F(x)\).
The function cont_ks_test
implements the FFT-based algorithm proposed by Moscovich and Nadler (2017) to compute the p-value \(P(D_{n} \ge d_{n})\), where \(d_{n}\) is the value of the KS test statistic computed based on a user provided data sample \(\{x_{1}, ..., x_{n}\}\), assuming \(F(x)\) is continuous.
This algorithm ensures a total worst-case run-time of order \(O(n^{2}log(n))\) which makes it more efficient and numerically stable than the algorithm proposed by Marsaglia et al. (2003).
The latter is used by many existing packages computing the cdf of \(D_{n}\), e.g., the function ks.test
in the package stats and the function ks.test
in the package dgof.
A limitation of the functions ks.test
is that the sample size should be less than 100, and the computation time is \(O(n^{3})\).
In contrast, the function cont_ks_test
provides results with at least 10 correct digits after the decimal point for sample sizes \(n\) up to 100000 and computation time of 16 seconds on a machine with an 2.5GHz Intel Core i5 processor with 4GB RAM, running MacOS X Yosemite.
For n
> 100000, accurate results can still be computed with similar accuracy, but at a higher computation time.
See Dimitrova, Kaishev, Tan (2020), Appendix C for further details and examples.
Dimitrina S. Dimitrova, Vladimir K. Kaishev, Senren Tan. (2020) "Computing the Kolmogorov-Smirnov Distribution When the Underlying CDF is Purely Discrete, Mixed or Continuous". Journal of Statistical Software, 95(10): 1-42. doi:10.18637/jss.v095.i10.
Moscovich A., Nadler B. (2017). "Fast Calculation of Boundary Crossing Probabilities for Poisson Processes". Statistics and Probability Letters, 123, 177-182.
# NOT RUN {
## Comparing the p-values obtained by stat::ks.test
## and KSgeneral::cont_ks_test
x<-abs(rnorm(100))
p.kt <- ks.test(x, "pexp", exact = TRUE)$p
p.kt_fft <- KSgeneral::cont_ks_test(x, "pexp")$p
abs(p.kt-p.kt_fft)
# }
Run the code above in your browser using DataLab