gofstat: Goodness-of-fit statistics

Description

Computes goodness-of-fit statistics for a fit of a parametric distribution on non-censored data.

Usage

gofstat(f, chisqbreaks, meancount, print.test = FALSE)

Arguments

An object of class 'fitdist' result of the function fitdist.

chisqbreaks

A numeric vector defining the breaks of the cells used to compute the chi-squared statistic. If omitted, these breaks are automatically computed from the data in order to reach roughly the same number of observations per cell, roughly equal to

meancount

The mean number of observations per cell expected for the definition of the breaks of the cells used to compute the chi-squared statistic. This argument will not be taken into account if the breaks are directly defined in the argument ch

print.test

If FALSE, the results of the tests are computed but not printed

Value

gof returns a list with following components,
chisqthe Chi-squared statistic or NULL if not computed
chisqbreaksbreaks used to define cells in the Chi-squared statistic
chisqpvaluep-value of the Chi-squared statistic or NULL if not computed
chisqdfdegree of freedom of the Chi-squared distribution or NULL if not computed
chisqtablea table with observed and theoretical counts used for the Chi-squared calculations
adthe Anderson-Darling statistic or NULL if not computed
adtestthe decision of the Anderson-Darling test or NULL if not computed
ksthe Kolmogorov-Smirnov statistic or NULL if not computed
kstestthe decision of the Kolmogorov-Smirnov test or NULL if not computed

Details

Goodness-of-fit statistics are computed. The Chi-squared statistic is computed using cells defined by the argument chisqbreaks or cells automatically defined from the data in order to reach roughly the same number of observations per cell, roughly equal to the argument meancount, or sligthly more if there are some ties. If chisqbreaks and meancount are both omitted, meancount is fixed in order to obtain roughly $(4n)^{2/5}$ cells, with $n$ the length of the dataset (Vose, 2000). The Chi-squared statistic is not computed if the program fails to define enough cells due to a too small dataset. When the Chi-squared statistic is computed, and if the degree of freedom (nb of cells - nb of parameters - 1) of the corresponding distribution is strictly positive, the p-value of the Chi-squared test is returned. For the distributions assumed continuous (all but "binom", "nbinom", "geom", "hyper" and "pois" for R base distributions), Kolmogorov-Smirnov and Anderson-Darling statistics are also computed, as defined by Cullen and Frey (1999). An approximate Kolmogorov-Smirnov test is performed by assuming the distribution parameters known. The critical value defined by Stephens (1986) for a completely specified distribution is used to reject or not the distribution at the significance level 0.05. Because of this approximation, the result of the test (decision of rejection of the distribution or not) is returned only for datasets with more than 30 observations. Note that this approximate test may be too conservative. For datasets with more than 5 observations and for distributions for which the test is described by Stephens (1986) ("norm", "lnorm", "exp", "cauchy", "gamma", "logis" and "weibull"), the Anderson-darling test is performed as described by Stephens (1986). This test takes into account the fact that the parameters are not known but estimated from the data. The result is the decision to reject or not the distribution at the significance level 0.05. Only recommended statistics are automatically printed, i.e. Anderson-Darling and Kolmogorov statistics for continuous distributions and Chi-squared statistics for discrete ones ( "binom", "nbinom", "geom", "hyper" and "pois" ). Results of the tests are printed only if print.test=TRUE. Even not printed, all the available results may be found in the list returned by the function.

References

Cullen AC and Frey HC (1999) Probabilistic techniques in exposure assessment. Plenum Press, USA, pp. 81-155. Stephens MA (1986) Tests based on edf statistics. In Goodness-of-fit techniques (D'Agostino RB and Stephens MA, eds), Marcel dekker, New York, pp. 97-194. Venables WN and Ripley BD (2002) Modern applied statistics with S. Springer, New York, pp. 435-446. Vose D (2000) Risk analysis, a quantitative guide. John Wiley & Sons Ltd, Chischester, England, pp. 99-143.

Examples

Run this code

# (1) for a fit of a normal distribution 
#

x1 <- c(6.4,13.3,4.1,1.3,14.1,10.6,9.9,9.6,15.3,22.1,13.4,
13.2,8.4,6.3,8.9,5.2,10.9,14.4)
print(f1 <- fitdist(x1,"norm"))
gofstat(f1)
gofstat(f1,print.test=TRUE)

# (2) fit a discrete distribution (Poisson)
#

x2<-c(rep(4,1),rep(2,3),rep(1,7),rep(0,12))
print(f2<-fitdist(x2,"pois"))
g2 <- gofstat(f2,chisqbreaks=c(0,1),print.test=TRUE)
g2$chisqtable


# (3) comparison of fits of various distributions
#

x3<-rweibull(n=100,shape=2,scale=1)
gofstat(f3a<-fitdist(x3,"weibull"))
gofstat(f3b<-fitdist(x3,"gamma"))
gofstat(f3c<-fitdist(x3,"exp"))

# (4) Use of Chi-squared results in addition to
#     recommended statistics for continuous distributions
#

x4<-rweibull(n=100,shape=2,scale=1)
f4<-fitdist(x4,"weibull")
g4 <-gofstat(f4,meancount=10)
print(g4)

Run the code above in your browser using DataLab