Last chance! 50% off unlimited learning
Sale ends in
fitdist(data, distr, method=c("mle", "mme"), start, chisqbreaks, meancount,...)
## S3 method for class 'fitdist':
print(x,...)
## S3 method for class 'fitdist':
plot(x,breaks="default",...)
## S3 method for class 'fitdist':
summary(object,...)
"name"
naming a distribution for which the corresponding
density function dname
, the corresponding distribution function pname
and the
corresponding quantile function qname
"mle"
for 'maximum likelihood estimation and "mme"
for 'matching moment estimation'.method="mme"
,
and may be omitted for some distributions for which reasonable
starting values are ch
"default"
the histogram is plotted with the function hist
with its default breaks definition. Else breaks
is passed to the function hist
.
This argument is not taken into account with discre"mledist"
if 'maximum likelihood' is the chosen method, in order to control the optimization method.fitdist
returns an object of class 'fitdist', a list with following components,"mle"
for 'maximum likelihood estimation' and "mme"
for 'matching moment estimation'NULL
if method="mme"
NULL
if method="mme"
NULL
if method="mme"
NULL
if method="mme"
NULL
if method="mme"
NULL
if not computedNULL
if not computedNULL
if not computedNULL
if not computedNULL
if not computedNULL
if not computedNULL
if not computedmethod="mle"
,
maximum likelihood estimations of the distribution parameters are computed using
the function mledist
.
By default direct optimization of the log-likelihood is performed using optim
,
with the "Nelder-Mead" method for distributions characterized by more than one parameter
and the "BFGS" method for distributions characterized by only one parameter.
The method used in optim
may be chosen or another optimization method
may be chosen using ... argument (see mledist
for details).
For the following named distributions, reasonable starting values will
be computed if start
is omitted : "norm"
, "lnorm"
,
"exp"
and "pois"
, "cauchy"
, "gamma"
, "logis"
,
"nbinom"
(parametrized by mu and size), "geom"
, "beta"
and "weibull"
.
Note that these starting
values may not be good enough if the fit is poor. The function is not able to fit a uniform distribution.
With the parameter estimates, the function returns the log-likelihood and the standard errors of
the estimates calculated from the
Hessian at the solution found by optim
or by the user-supplied function passed to mledist.
When method="mme"
,
the estimated values of the distribution parameters are provided only for the following
distributions : "norm"
, "lnorm"
, "pois"
, "exp"
, "gamma"
,
"nbinom"
, "geom"
, "beta"
, "unif"
and "logis"
.
For distributions characterized by one parameter ("geom"
, "pois"
and "exp"
),
this parameter is simply
estimated by matching theoretical and observed means, and for distributions characterized by
two parameters, these parameters are estimated by matching theoretical and observed means
and variances (Vose, 2000).
Goodness-of-fit statistics are computed. The Chi-squared statistic is computed using cells defined
by the argument
chisqbreaks
or cells automatically defined from the data in order
to reach roughly the same number of observations per cell, roughly equal to the argument
meancount
, or sligthly more if there are some ties. If chisqbreaks
and meancount
are both omitted, meancount
is fixed in order to obtain roughly $(4n)^{2/5}$ cells,
with $n$ the length of the dataset (Vose, 2000).
The Chi-squared statistic is not computed if the program fails
to define enough cells due to a too small dataset. When the Chi-squared statistic is computed,
and if the degree of freedom (nb of cells - nb of parameters - 1) of the corresponding distribution
is strictly positive, the p-value of the Chi-squared test is returned.
For the distributions assumed continuous (all but "binom"
,
"nbinom"
, "geom"
, "hyper"
and "pois"
for R base distributions),
Kolmogorov-Smirnov and Anderson-Darling statistics are also computed,
as defined by Cullen and Frey (1999).
An approximate Kolmogorov-Smirnov test is
performed by assuming the distribution parameters known. The critical value defined by Stephens (1986)
for a completely specified distribution is used to reject or not the
distribution at the significance level 0.05. Because of this approximation, the result of the test
(decision of rejection of the distribution or not) is returned only for datasets with more
than 30 observations. Note that this approximate test may be too conservative.
For datasets with more than 5 observations and for distributions for
which the test is described by Stephens (1986) ("norm"
, "lnorm"
,
"exp"
, "cauchy"
, "gamma"
, "logis"
and "weibull"
),
the Anderson-darling test is performed as described by Stephens (1986). This test takes into
account the fact that the parameters are not known but estimated from the data. The result is the
decision to reject or not the distribution at the significance level 0.05.
The plot of an object of class "fitdist" returned by fitdist
uses the function
plotdist
.plotdist
, optim
, mledist
, mmedist
and fitdistcens
.# (1) basic fit of a normal distribution with maximum likelihood estimation
#
x1 <- c(6.4,13.3,4.1,1.3,14.1,10.6,9.9,9.6,15.3,22.1,13.4,
13.2,8.4,6.3,8.9,5.2,10.9,14.4)
f1 <- fitdist(x1,"norm")
print(f1)
plot(f1)
summary(f1)
f1$chisqtable
# (2) use the moment matching estimation
#
f1b <- fitdist(x1,"norm",method="mme",meancount=6)
summary(f1b)
f1b$chisqtable
# (3) MME for log normal distribution
#
f1c <- fitdist(x1,"lnorm",method="mme",meancount=6)
summary(f1c)
f1c$chisqtable
# (4) defining your own distribution functions, here for the Gumbel distribution
# for other distributions, see the CRAN task view dedicated to probability distributions
dgumbel <- function(x,a,b) 1/b*exp((a-x)/b)*exp(-exp((a-x)/b))
pgumbel <- function(q,a,b) exp(-exp((a-q)/b))
qgumbel <- function(p,a,b) a-b*log(-log(p))
f1c <- fitdist(x1,"gumbel",start=list(a=10,b=5))
print(f1c)
plot(f1c)
# (5) fit a discrete distribution (Poisson)
#
x2<-c(rep(4,1),rep(2,3),rep(1,7),rep(0,12))
f2<-fitdist(x2,"pois",chisqbreaks=c(0,1))
plot(f2)
summary(f2)
f2$chisqtable
# (5) comparison of fits of various distributions
#
xw<-rweibull(n=100,shape=2,scale=1)
fa<-fitdist(xw,"weibull")
summary(fa)
fa$chisqtable
fb<-fitdist(xw,"gamma")
summary(fb)
fc<-fitdist(xw,"exp")
summary(fc)
# (6) how to change the optimisation method?
#
fitdist(x1,"gamma",optim.method="Nelder-Mead")
fitdist(x1,"gamma",optim.method="BFGS")
fitdist(x1,"gamma",optim.method="L-BFGS-B",lower=c(0,0))
fitdist(x1,"gamma",optim.method="SANN")
# (7) custom optimisation function
#
#create the sample
mysample <- rexp(100, 5)
mystart <- 8
res1 <- fitdist(mysample, dexp, start= mystart, optim.method="Nelder-Mead")
#show the result
summary(res1)
#the warning tell us to use optimise, because the Nelder-Mead is not adequate.
#to meet the standard 'fn' argument and specific name arguments, we wrap optimize,
myoptimize <- function(fn, par, ...)
{
res <- optimize(f=fn, ..., maximum=FALSE) #assume the optimization function minimize
standardres <- c(res, convergence=0, value=res$objective, par=res$minimum, hessian=NA)
return(standardres)
}
#call fitdist with a 'custom' optimization function
res2 <- fitdist(mysample, dexp, start=mystart, custom.optim=myoptimize, interval=c(0, 100))
#show the result
summary(res2)
# (8) custom optimisation function - another example with the genetic algorithm
#
#set a sample
x1 <- c(6.4, 13.3, 4.1, 1.3, 14.1, 10.6, 9.9, 9.6, 15.3, 22.1, 13.4, 13.2, 8.4, 6.3, 8.9, 5.2, 10.9, 14.4)
fit1 <- fitdist(x1, "gamma")
summary(fit1)
#wrap genoud function rgenoud package
mygenoud <- function(fn, par, ...)
{
require(rgenoud)
res <- genoud(fn, starting.values=par, ...)
standardres <- c(res, convergence=0)
return(standardres)
}
#call fitdist with a 'custom' optimization function
fit2 <- fitdist(x1, "gamma", custom.optim=mygenoud, nvars=2,
Domains=cbind(c(0,0), c(10, 10)), boundary.enforcement=1,
print.level=1, hessian=TRUE)
summary(fit2)
Run the code above in your browser using DataLab