Learn R Programming

KSgeneral (version 1.1.1)

KSgeneral-package: KSgeneral

Description

The one-sample two-sided Kolmogorov-Smirnov (KS) statistic is one of the most popular goodness-of-fit test statistics that is used to measure how well the distribution of a random sample agrees with a prespecified theoretical distribution. Given a random sample \(\{X_{1}, ..., X_{n}\}\) of size \(n\) with an empirical cdf \(F_{n}(x)\), the two-sided KS statistic is defined as \(D_{n} = \sup | F_{n}(x) - F(x) | \), where \(F(x)\) is the cdf of the prespecified theoretical distribution under the null hypothesis \(H_{0}\), that \( \{ X_{1}, ..., X_{n} \} \) comes from \(F(x)\).

The package KSgeneral implements a novel, accurate and efficient Fast Fourier Transform (FFT)-based method, referred as Exact-KS-FFT method to compute the complementary cdf, \(P(D_{n} \ge q)\), at a fixed \(q\in[0, 1]\) for a given (hypothezied) purely discrete, mixed or continuous underlying cdf \(F(x)\), and arbitrary, possibly large sample size \(n\). A plot of the complementary cdf \(P(D_{n} \ge q)\), \(0 \le q \le 1\), can also be produced.

In other words, the package computes the p-value, \(P(D_{n} \ge q)\) for any fixed critical level \(q\in[0, 1]\). If a data sample, \(\{x_{1}, ..., x_{n}\}\) is supplied, KSgeneral computes the p-value \(P(D_{n} \ge d_{n})\), where \(d_{n}\) is the value of the KS test statistic computed based on \(\{x_{1}, ..., x_{n}\}\).

Remark: The description of the package and its functions are primarily tailored to computing the (complementary) cdf of the two-sided KS statistic, \(D_{n}\). It should be noted however that one can compute the (complementary) cdf for the one-sided KS statistics \(D_{n}^{-}\) or \(D_{n}^{+}\) (cf., Dimitrova, Kaishev, Tan (2020)) by appropriately specifying correspondingly \(A_{i} = 0\) for all \(i\) or \(B_{i} = 1\) for all \(i\), in the function ks_c_cdf_Rcpp.

Arguments

Details

The Exact-KS-FFT method underlying KSgeneral is based on expressing the p-value \(P(D_{n} \ge q)\) in terms of an appropriate rectangle probability with respect to the uniform order statistics, as noted by Gleser (1985) for \(P(D_{n} > q)\). The latter representation is used to express \(P(D_{n} \ge q)\) via a double-boundary non-crossing probability for a homogeneous Poisson process, with intensity \(n\), which is then efficiently computed using FFT, ensuring total run-time of order \(O(n^{2}log(n))\) (see Dimitrova, Kaishev, Tan (2020) and also Moscovich and Nadler (2017) for the special case when \(F(x)\) is continuous).

KSgeneral represents an R wrapper of the original C++ code due to Dimitrova, Kaishev, Tan (2020) and based on the C++ code developed by Moscovich and Nadler (2017). The package includes the functions disc_ks_c_cdf, mixed_ks_c_cdf and cont_ks_c_cdf that compute the complementary cdf \(P(D_n \ge q)\), for a fixed \(q\), \(0 \le q \le 1\), when \(F(x)\) is purely discrete, mixed or continuous, respectively. KSgeneral includes also the functions disc_ks_test, mixed_ks_test and cont_ks_test that compute the p-value \(P(D_{n} \ge d_{n})\), where \(d_{n}\) is the value of the KS test statistic computed based on a user provided data sample \(\{x_{1}, ..., x_{n}\}\), when \(F(x)\) is purely discrete, mixed or continuous, respectively.

The functions disc_ks_test and cont_ks_test represent accurate and fast (run time \(O(n^{2}log(n))\)) alternatives to the functions ks.test from the package dgof and the function ks.test from the package stat, which compute p-values of \(P(D_{n} \ge d_{n})\), assuming \(F(x)\) is purely discrete or continuous, respectively.

The package also includes the function ks_c_cdf_Rcpp which gives the flexibility to compute the complementary cdf (p-value) for the one-sided KS test statistics \(D_{n}^{-}\) or \(D_{n}^{+}\). It also allows for faster computation time and possibly higher accuracy in computing \(P(D_{n} \ge q)\).

References

Dimitrina S. Dimitrova, Vladimir K. Kaishev, Senren Tan. (2020) "Computing the Kolmogorov-Smirnov Distribution When the Underlying CDF is Purely Discrete, Mixed or Continuous". Journal of Statistical Software, 95(10): 1-42. doi:10.18637/jss.v095.i10.

Gleser L.J. (1985). "Exact Power of Goodness-of-Fit Tests of Kolmogorov Type for Discontinuous Distributions". Journal of the American Statistical Association, 80(392), 954-958.

Moscovich A., Nadler B. (2017). "Fast Calculation of Boundary Crossing Probabilities for Poisson Processes". Statistics and Probability Letters, 123, 177-182.