zeroinfl: Zero-inflated Count Data Regression

Description

Fit zero-inflated regression models for count data via maximum likelihood.

Usage

zeroinfl(formula, data, subset, na.action, weights, offset,
  dist = c("poisson", "negbin", "geometric"),
  link = c("logit", "probit", "cloglog", "cauchit", "log"),
  control = zeroinfl.control(...),
  model = TRUE, y = TRUE, x = FALSE, ...)

Arguments

formula

symbolic description of the model, see details.

data, subset, na.action

arguments controlling formula processing via model.frame.

weights

optional numeric vector of weights.

offset

optional numeric vector with an a priori known component to be included in the linear predictor of the count model. See below for more information on offsets.

dist

character specification of count model family (a log link is always used).

link

character specification of link function in the binary zero-inflation model (a binomial family is always used).

control

a list of control arguments specified via zeroinfl.control.

model, y, x

logicals. If TRUE the corresponding components of the fit (model frame, response, model matrix) are returned.

...

arguments passed to zeroinfl.control in the default setup.

Value

An object of class "zeroinfl", i.e., a list with components including
coefficientsa list with elements "count" and "zero" containing the coefficients from the respective models,
residualsa vector of raw residuals (observed - fitted),
fitted.valuesa vector of fitted means,
optima list with the output from the optim call for minimizing the negative log-likelihood,
controlthe control arguments passed to the optim call,
startthe starting values for the parameters passed to the optim call,
weightsthe case weights used,
offseta list with elements "count" and "zero" containing the offset vectors (if any) from the respective models,
nnumber of observations (with weights > 0),
df.nullresidual degrees of freedom for the null model (= n - 2),
df.residualresidual degrees of freedom for fitted model,
termsa list with elements "count", "zero" and "full" containing the terms objects for the respective models,
thetaestimate of the additional $\theta$ parameter of the negative binomial model (if a negative binomial regression is used),
SE.logthetastandard error for $\log(\theta)$,
logliklog-likelihood of the fitted model,
vcovcovariance matrix of all coefficients in the model (derived from the Hessian of the optim output),
distcharacter string describing the count distribution used,
linkcharacter string describing the link of the zero-inflation model,
linkinvthe inverse link function corresponding to link,
convergedlogical indicating successful convergence of optim,
callthe original function call,
formulathe original formula,
levelslevels of the categorical regressors,
contrastsa list with elements "count" and "zero" containing the contrasts corresponding to levels from the respective models,
modelthe full model frame (if model = TRUE),
ythe response count vector (if y = TRUE),
xa list with elements "count" and "zero" containing the model matrices from the respective models (if x = TRUE),

Details

Zero-inflated count models are two-component mixture models combining a point mass at zero with a proper count distribution. Thus, there are two sources of zeros: zeros may come from both the point mass and from the count component. Usually the count model is a Poisson or negative binomial regression (with log link). The geometric distribution is a special case of the negative binomial with size parameter equal to 1. For modeling the unobserved state (zero vs. count), a binary model is used that captures the probability of zero inflation. in the simplest case only with an intercept but potentially containing regressors. For this zero-inflation model, a binomial model with different links can be used, typically logit or probit. The formula can be used to specify both components of the model: If a formula of type y ~ x1 + x2 is supplied, then the same regressors are employed in both components. This is equivalent to y ~ x1 + x2 | x1 + x2. Of course, a different set of regressors could be specified for the count and zero-inflation component, e.g., y ~ x1 + x2 | z1 + z2 + z3 giving the count data model y ~ x1 + x2 conditional on (|) the zero-inflation model y ~ z1 + z2 + z3. A simple inflation model where all zero counts have the same probability of belonging to the zero component can by specified by the formula y ~ x1 + x2 | 1.

Offsets can be specified in both components of the model pertaining to count and zero-inflation model: y ~ x1 + offset(x2) | z1 + z2 + offset(z3), where x2 is used as an offset (i.e., with coefficient fixed to 1) in the count component and z3 analogously in the zero-inflation component. By the rule stated above y ~ x1 + offset(x2) is expanded to y ~ x1 + offset(x2) | x1 + offset(x2). Instead of using the offset() wrapper within the formula, the offset argument can also be employed which sets an offset only for the count model. Thus, formula = y ~ x1 and offset = x2 is equivalent to formula = y ~ x1 + offset(x2) | x1. All parameters are estimated by maximum likelihood using optim, with control options set in zeroinfl.control. Starting values can be supplied, estimated by the EM (expectation maximization) algorithm, or by glm.fit (the default). Standard errors are derived numerically using the Hessian matrix returned by optim. See zeroinfl.control for details. The returned fitted model object is of class "zeroinfl" and is similar to fitted "glm" objects. For elements such as "coefficients" or "terms" a list is returned with elements for the zero and count component, respectively. For details see below. A set of standard extractor functions for fitted model objects is available for objects of class "zeroinfl", including methods to the generic functions print, summary, coef, vcov, logLik, residuals, predict, fitted, terms, model.matrix. See predict.zeroinfl for more details on all methods.

References

Cameron, A. Colin and Pravin K. Trevedi. 1998. Regression Analysis of Count Data. New York: Cambridge University Press.

Cameron, A. Colin and Pravin K. Trivedi. 2005. Microeconometrics: Methods and Applications. Cambridge: Cambridge University Press.

Lambert, Diane. 1992. Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing. Technometrics. 34(1):1-14

Zeileis, Achim, Christian Kleiber and Simon Jackman 2008. Regression Models for Count Data in R. Journal of Statistical Software, 27(8). URL http://www.jstatsoft.org/v27/i08/.

Examples

Run this code

## data
data("bioChemists", package = "pscl")

## without inflation
## ("art ~ ." is "art ~ fem + mar + kid5 + phd + ment")
fm_pois <- glm(art ~ ., data = bioChemists, family = poisson)
fm_qpois <- glm(art ~ ., data = bioChemists, family = quasipoisson)
fm_nb <- glm.nb(art ~ ., data = bioChemists)

## with simple inflation (no regressors for zero component)
fm_zip <- zeroinfl(art ~ . | 1, data = bioChemists)
fm_zinb <- zeroinfl(art ~ . | 1, data = bioChemists, dist = "negbin")

## inflation with regressors
## ("art ~ . | ." is "art ~ fem + mar + kid5 + phd + ment | fem + mar + kid5 + phd + ment")
fm_zip2 <- zeroinfl(art ~ . | ., data = bioChemists)
fm_zinb2 <- zeroinfl(art ~ . | ., data = bioChemists, dist = "negbin")

Run the code above in your browser using DataLab