Parameters must be estimated by maximum likelihood (ML) in order
for the P-values computed here to be asymptotically valid.
They are computed using the fact that when parameters are
estimated by maximum likelihood and the null hypothesis
is true, the asymptotic distribution of the GOF statistic
is the distribution of an infinite weighted sum
of weighted chi-square random variables on 1 degree of freedom.
The weights are eigenvalues of an integral equation. They
depend on the distribution being tested, the statistic being used,
and in some cases on the actual parameter values. These weights
are then computed approximately by discretization of the integral
equation; when that equation depends on one or more parameter
values we use the MLE in the equation.
Some notes on the specific distributions: For the Normal, Logistic,
Laplace, Extreme Value, Weibull and Exponential distributions, the
limiting distributions do not depend on the parameters. For the
Gamma distribution, the shape parameter affects the limiting
distribution. The tests remain asymptotically valid when the MLE
is used to approximate the limit distribution.
The Exponential distribution is a special case of the Weibull and
Gamma families arising when the shape is known to be 1. Knowing a
parameter and therefore not estimating it affects the distribution
of the test statistic and the functions provided for the Exponential
distribution allow for this.
If a data set X_1,...,X_n follows the Weibull distribution then
Y_1 = log(X_1), ... ,Y_n = log(X_n) follows the Extreme Value
distribution and vice versa. The two procedures give identical
test statistics and P-values, in principal.
Some of the models have more than one common parametrization. For
the Exponential, Gamma, and Weibull distributions, some writers
use a rate parameter and some use the scale parameter which is
the inverse of the rate. Our code uses the scale parameter.
For the Laplace distribution, some writers use the density
\(f(x)= exp(-|x-\mu|/\beta)/(2\beta)\) in which \(\beta\)
is a scale parameter. Others use the
standard deviation \(\sigma = \beta/\sqrt{2}\). Our code
uses the scale parameter.
For the Uniform distribution, we offer code for the two parameter
Uniform distribution on the range \(\theta_1\) to \(\theta_2\).
These are estimated by the sample minimum and sample maximum.
The probability integral transforms of the remaining n-2 points
are then tested for uniformity on the range 0 to 1. This procedure
is justified because the these probability integral transforms
have exactly this distribution if the original data had a uniform
distribution over any interval.
It is not unusual to test the hypothesis that a sample follows the
standard uniform distribution on [0,1]. In this case the parameters
should not be estimated. Instead use AD(z) or CvM(z) or
Watson(z) to compute the statistic values and then get P-values from
AD.uniform.pvalue(a) or CvM.uniform.pvalue(w) or
Watson.uniform.pvalue(u) whichever is wanted.