fitdist: Fitting of univariate distributions to non-censored data

Description

Fits a univariate distribution to non-censored data by maximum likelihood or matching moments.

Usage

fitdist(data, distr, method=c("mle", "mme"), start,...)
## S3 method for class 'fitdist':
print(x,...)
## S3 method for class 'fitdist':
plot(x,breaks="default",...)
## S3 method for class 'fitdist':
summary(object,...)

Arguments

data

A numeric vector.

distr

A character string "name" naming a distribution for which the corresponding density function dname, the corresponding distribution function pname and the corresponding quantile function qname

method

A character string coding for the fitting method: "mle" for 'maximum likelihood estimation and "mme" for 'matching moment estimation'.

start

A named list giving the initial values of parameters of the named distribution. This argument will not be taken into account if method="mme", and may be omitted for some distributions for which reasonable starting values are

an object of class 'fitdist'.

object

an object of class 'fitdist'.

breaks

If "default" the histogram is plotted with the function hist with its default breaks definition. Else breaks is passed to the function hist. This argument is not taken into account with discre

...

further arguments to be passed to generic functions, or to the function "mledist" if 'maximum likelihood' is the chosen method, in order to control the optimization method.

Value

fitdist returns an object of class 'fitdist', a list with following components,
estimatethe parameter estimates
methodthe character string coding for the fitting method : "mle" for 'maximum likelihood estimation' and "mme" for 'matching moment estimation'
sdthe estimated standard errors or NULL if method="mme"
corthe estimated correlation matrix or NULL if method="mme"
loglikthe log-likelihood or NULL if method="mme"
aicthe Akaike information criterion or NULL if method="mme"
bicthe the so-called BIC or SBC (Schwarz Bayesian criterion) or NULL if method="mme"
nthe length of the data set
datathe dataset
distnamethe name of the distribution

Details

When method="mle", maximum likelihood estimations of the distribution parameters are computed using the function mledist. By default direct optimization of the log-likelihood is performed using optim, with the "Nelder-Mead" method for distributions characterized by more than one parameter and the "BFGS" method for distributions characterized by only one parameter. The method used in optim may be chosen or another optimization method may be chosen using ... argument (see mledist for details). For the following named distributions, reasonable starting values will be computed if start is omitted : "norm", "lnorm", "exp" and "pois", "cauchy", "gamma", "logis", "nbinom" (parametrized by mu and size), "geom", "beta" and "weibull". Note that these starting values may not be good enough if the fit is poor. The function is not able to fit a uniform distribution. With the parameter estimates, the function returns the log-likelihood and the standard errors of the estimates calculated from the Hessian at the solution found by optim or by the user-supplied function passed to mledist. When method="mme", the estimated values of the distribution parameters are provided only for the following distributions : "norm", "lnorm", "pois", "exp", "gamma", "nbinom", "geom", "beta", "unif" and "logis". For distributions characterized by one parameter ("geom", "pois" and "exp"), this parameter is simply estimated by matching theoretical and observed means, and for distributions characterized by two parameters, these parameters are estimated by matching theoretical and observed means and variances (Vose, 2000). The plot of an object of class "fitdist" returned by fitdist uses the function plotdist.

References

Cullen AC and Frey HC (1999) Probabilistic techniques in exposure assessment. Plenum Press, USA, pp. 81-155. Venables WN and Ripley BD (2002) Modern applied statistics with S. Springer, New York, pp. 435-446. Vose D (2000) Risk analysis, a quantitative guide. John Wiley & Sons Ltd, Chischester, England, pp. 99-143.

Examples

Run this code

# (1) basic fit of a normal distribution with maximum likelihood estimation
#

x1 <- c(6.4,13.3,4.1,1.3,14.1,10.6,9.9,9.6,15.3,22.1,13.4,
13.2,8.4,6.3,8.9,5.2,10.9,14.4)
f1 <- fitdist(x1,"norm")
print(f1)
plot(f1)
summary(f1)

# (2) use the moment matching estimation
#

f1b <- fitdist(x1,"norm",method="mme")
summary(f1b)

# (3) MME for log normal distribution
#

f1c <- fitdist(x1,"lnorm",method="mme")
summary(f1c)

# (4) defining your own distribution functions, here for the Gumbel distribution
# for other distributions, see the CRAN task view dedicated to probability distributions

dgumbel <- function(x,a,b) 1/b*exp((a-x)/b)*exp(-exp((a-x)/b))
pgumbel <- function(q,a,b) exp(-exp((a-q)/b))
qgumbel <- function(p,a,b) a-b*log(-log(p))

f1c <- fitdist(x1,"gumbel",start=list(a=10,b=5))
print(f1c)
plot(f1c)

# (5) fit a discrete distribution (Poisson)
#

x2<-c(rep(4,1),rep(2,3),rep(1,7),rep(0,12))
f2<-fitdist(x2,"pois")
plot(f2)
summary(f2)

# (6) how to change the optimisation method?
#

fitdist(x1,"gamma",optim.method="Nelder-Mead")
fitdist(x1,"gamma",optim.method="BFGS") 
fitdist(x1,"gamma",optim.method="L-BFGS-B",lower=c(0,0))
fitdist(x1,"gamma",optim.method="SANN")

# (7) custom optimisation function
#

#create the sample
mysample <- rexp(100, 5)
mystart <- 8

res1 <- fitdist(mysample, dexp, start= mystart, optim.method="Nelder-Mead")

#show the result
summary(res1)

#the warning tell us to use optimise, because the Nelder-Mead is not adequate.

#to meet the standard 'fn' argument and specific name arguments, we wrap optimize,
myoptimize <- function(fn, par, ...) 
{
    res <- optimize(f=fn, ..., maximum=FALSE)  #assume the optimization function minimize
    
    standardres <- c(res, convergence=0, value=res$objective, par=res$minimum, hessian=NA)
    
    return(standardres)
}

#call fitdist with a 'custom' optimization function
res2 <- fitdist(mysample, dexp, start=mystart, custom.optim=myoptimize, interval=c(0, 100))

#show the result
summary(res2)


# (8) custom optimisation function - another example with the genetic algorithm
#
#set a sample
    x1 <- c(6.4, 13.3, 4.1, 1.3, 14.1, 10.6, 9.9, 9.6, 15.3, 22.1, 13.4, 13.2, 8.4, 6.3, 8.9, 5.2, 10.9, 14.4) 
    fit1 <- fitdist(x1, "gamma")
    summary(fit1)

    #wrap genoud function rgenoud package
    mygenoud <- function(fn, par, ...) 
    {
        require(rgenoud)
        res <- genoud(fn, starting.values=par, ...)        
        standardres <- c(res, convergence=0)
            
        return(standardres)
    }

    #call fitdist with a 'custom' optimization function
    fit2 <- fitdist(x1, "gamma", custom.optim=mygenoud, nvars=2,    
        Domains=cbind(c(0,0), c(10, 10)), boundary.enforcement=1, 
        print.level=1, hessian=TRUE)

    summary(fit2)

Run the code above in your browser using DataLab