ks.test(x, y, ..., alternative = c("two.sided", "less", "greater"), exact = NULL, tol=1e-8, simulate.p.value=FALSE, B=2000)
"greater". You can specify just the initial letter of the value, but the argument name must be give in full. See Details for the meanings of the possible values.
NULLor a logical indicating whether an exact p-value should be computed. See Details for the meaning of
NULL. Not used for the one-sided two-sample case.
b) when needing to check for equality (
a==b); value of
0does exact comparisons but risks making errors due to numerical imprecisions.
"htest"containing the following components:
yis numeric, a two-sample test of the null hypothesis that
ywere drawn from the same continuous distribution is performed.
y can be a character string naming a continuous
(cumulative) distribution function (or such a function),
(or object of class
stepfun) giving a discrete distribution. In
these cases, a one-sample test is carried out of the null that the
distribution function which generated
x is distribution
with parameters specified by
The presence of ties generates a warning unless
y describes a discrete
distribution (see above), since continuous distributions do not generate them.
The possible values
alternative specify the null hypothesis
that the true distribution function of
x is equal to, not less
than or not greater than the hypothesized distribution function
(one-sample case) or the distribution function of
case), respectively. This is a comparison of cumulative distribution
functions, and the test statistic is the maximum difference in value,
with the statistic in the
"greater" alternative being
$D^+ = max[F_x(u) - F_y(u)]$.
Thus in the two-sample case
alternative="greater" includes distributions for which
is stochastically smaller than
y (the CDF of
above and hence to the left of that for
y), in contrast to
Exact p-values are not available for the one-sided two-sample case,
or in the case of ties if
y is continuous. If
exact = NULL
(the default), an exact p-value is computed if the sample size is less
than 100 in the one-sample case with
y continuous or if the sample
size is less than or equal to 30 with
discrete; or if the product of the
sample sizes is less than 10000 in the two-sample case for continuous
asymptotic distributions are used whose approximations may be inaccurate
in small samples. With
the one-sample two-sided case, exact p-values are
obtained as described in Marsaglia, Tsang & Wang (2003); the formula of
Birnbaum & Tingey (1951) is used for the one-sample one-sided case.
In the one-sample case with
y discrete, the methods presented in
Conover (1972) and Gleser (1985) are used when
exact=TRUE (or when
length(x)<=30< code=""> as described above.
length(x)>30, the test is not exact and the resulting p-values
are known to be conservative. Usage of
sample sizes greater than 30 is not advised due to numerical instabilities;
in such cases, simulated p-values may be desirable.=30<>
If a single-sample test is used with
the parameters specified in
... must be pre-specified and not estimated from the data.
There is some more refined distribution theory for the KS test with
estimated parameters (see Durbin, 1973), but that is not implemented
William J. Conover (1971), Practical Nonparametric Statistics. New York: John Wiley & Sons. Pages 295--301 (one-sample Kolmogorov test), 309--314 (two-sample Smirnov test).
William J. Conover (1972), A Kolmogorov Goodness-of-Fit Test for Discontinuous Distributions. Journal of American Statistical Association, Vol. 67, No. 339, 591--596.
Leon Jay Gleser (1985), Exact Power of Goodness-of-Fit Tests of Kolmogorov Type for Discontinuous Distributions. Journal of American Statistical Association, Vol. 80, No. 392, 954--958.
Durbin, J. (1973) Distribution theory for tests based on the sample distribution function. SIAM.
George Marsaglia, Wai Wan Tsang and Jingbo Wang (2003), Evaluating Kolmogorov's distribution. Journal of Statistical Software, 8/18. http://www.jstatsoft.org/v08/i18/.
shapiro.testwhich performs the Shapiro-Wilk test for normality;
cvm.testfor Cramer-von Mises type tests.
require(graphics) require(dgof) set.seed(1) x <- rnorm(50) y <- runif(30) # Do x and y come from the same distribution? ks.test(x, y) # Does x come from a shifted gamma distribution with shape 3 and rate 2? ks.test(x+2, "pgamma", 3, 2) # two-sided, exact ks.test(x+2, "pgamma", 3, 2, exact = FALSE) ks.test(x+2, "pgamma", 3, 2, alternative = "gr") # test if x is stochastically larger than x2 x2 <- rnorm(50, -1) plot(ecdf(x), xlim=range(c(x, x2))) plot(ecdf(x2), add=TRUE, lty="dashed") t.test(x, x2, alternative="g") wilcox.test(x, x2, alternative="g") ks.test(x, x2, alternative="l") ######################################################### # TBA, JWE new examples added for discrete distributions: x3 <- sample(1:10, 25, replace=TRUE) # Using ecdf() to specify a discrete distribution: ks.test(x3, ecdf(1:10)) # Using step() to specify the same discrete distribution: myfun <- stepfun(1:10, cumsum(c(0, rep(0.1, 10)))) ks.test(x3, myfun) # The previous R ks.test() does not correctly calculate the # test statistic for discrete distributions (gives warning): # stats::ks.test(c(0, 1), ecdf(c(0, 1))) # ks.test(c(0, 1), ecdf(c(0, 1))) # Even when the correct test statistic is given, the # previous R ks.test() gives conservative p-values: stats::ks.test(rep(1, 3), ecdf(1:3)) ks.test(rep(1, 3), ecdf(1:3)) ks.test(rep(1, 3), ecdf(1:3), simulate=TRUE, B=10000)