Fit zero-inflated regression models for count data via maximum likelihood.

```
zeroinfl(formula, data, subset, na.action, weights, offset,
dist = c("poisson", "negbin", "geometric"),
link = c("logit", "probit", "cloglog", "cauchit", "log"),
control = zeroinfl.control(...),
model = TRUE, y = TRUE, x = FALSE, ...)
```

An object of class `"zeroinfl"`

, i.e., a list with components including

- coefficients
a list with elements

`"count"`

and`"zero"`

containing the coefficients from the respective models,- residuals
a vector of raw residuals (observed - fitted),

- fitted.values
a vector of fitted means,

- optim
a list with the output from the

`optim`

call for minimizing the negative log-likelihood,- control
the control arguments passed to the

`optim`

call,- start
the starting values for the parameters passed to the

`optim`

call,- weights
the case weights used,

- offset
a list with elements

`"count"`

and`"zero"`

containing the offset vectors (if any) from the respective models,- n
number of observations (with weights > 0),

- df.null
residual degrees of freedom for the null model (=

`n - 2`

),- df.residual
residual degrees of freedom for fitted model,

- terms
a list with elements

`"count"`

,`"zero"`

and`"full"`

containing the terms objects for the respective models,- theta
estimate of the additional \(\theta\) parameter of the negative binomial model (if a negative binomial regression is used),

- SE.logtheta
standard error for \(\log(\theta)\),

- loglik
log-likelihood of the fitted model,

- vcov
covariance matrix of all coefficients in the model (derived from the Hessian of the

`optim`

output),- dist
character string describing the count distribution used,

- link
character string describing the link of the zero-inflation model,

- linkinv
the inverse link function corresponding to

`link`

,- converged
logical indicating successful convergence of

`optim`

,- call
the original function call,

- formula
the original formula,

- levels
levels of the categorical regressors,

- contrasts
a list with elements

`"count"`

and`"zero"`

containing the contrasts corresponding to`levels`

from the respective models,- model
the full model frame (if

`model = TRUE`

),- y
the response count vector (if

`y = TRUE`

),- x
a list with elements

`"count"`

and`"zero"`

containing the model matrices from the respective models (if`x = TRUE`

),

- formula
symbolic description of the model, see details.

- data, subset, na.action
arguments controlling formula processing via

`model.frame`

.- weights
optional numeric vector of weights.

- offset
optional numeric vector with an a priori known component to be included in the linear predictor of the count model. See below for more information on offsets.

- dist
character specification of count model family (a log link is always used).

- link
character specification of link function in the binary zero-inflation model (a binomial family is always used).

- control
a list of control arguments specified via

`zeroinfl.control`

.- model, y, x
logicals. If

`TRUE`

the corresponding components of the fit (model frame, response, model matrix) are returned.- ...
arguments passed to

`zeroinfl.control`

in the default setup.

Achim Zeileis <Achim.Zeileis@R-project.org>

Zero-inflated count models are two-component mixture models combining a point mass at zero with a proper count distribution. Thus, there are two sources of zeros: zeros may come from both the point mass and from the count component. Usually the count model is a Poisson or negative binomial regression (with log link). The geometric distribution is a special case of the negative binomial with size parameter equal to 1. For modeling the unobserved state (zero vs. count), a binary model is used that captures the probability of zero inflation. in the simplest case only with an intercept but potentially containing regressors. For this zero-inflation model, a binomial model with different links can be used, typically logit or probit.

The `formula`

can be used to specify both components of the model:
If a `formula`

of type `y ~ x1 + x2`

is supplied, then the same
regressors are employed in both components. This is equivalent to
`y ~ x1 + x2 | x1 + x2`

. Of course, a different set of regressors
could be specified for the count and zero-inflation component, e.g.,
`y ~ x1 + x2 | z1 + z2 + z3`

giving the count data model `y ~ x1 + x2`

conditional on (`|`

) the zero-inflation model `y ~ z1 + z2 + z3`

.
A simple inflation model where all zero counts have the same
probability of belonging to the zero component can by specified by the formula
`y ~ x1 + x2 | 1`

.

Offsets can be specified in both components of the model pertaining to count and
zero-inflation model: `y ~ x1 + offset(x2) | z1 + z2 + offset(z3)`

, where
`x2`

is used as an offset (i.e., with coefficient fixed to 1) in the
count component and `z3`

analogously in the zero-inflation component. By the rule
stated above `y ~ x1 + offset(x2)`

is expanded to
`y ~ x1 + offset(x2) | x1 + offset(x2)`

. Instead of using the
`offset()`

wrapper within the `formula`

, the `offset`

argument
can also be employed which sets an offset only for the count model. Thus,
`formula = y ~ x1`

and `offset = x2`

is equivalent to
`formula = y ~ x1 + offset(x2) | x1`

.

All parameters are estimated by maximum likelihood using `optim`

,
with control options set in `zeroinfl.control`

.
Starting values can be supplied, estimated by the EM (expectation maximization)
algorithm, or by `glm.fit`

(the default). Standard errors
are derived numerically using the Hessian matrix returned by `optim`

.
See `zeroinfl.control`

for details.

The returned fitted model object is of class `"zeroinfl"`

and is similar
to fitted `"glm"`

objects. For elements such as `"coefficients"`

or
`"terms"`

a list is returned with elements for the zero and count component,
respectively. For details see below.

A set of standard extractor functions for fitted model objects is available for
objects of class `"zeroinfl"`

, including methods to the generic functions
`print`

, `summary`

, `coef`

,
`vcov`

, `logLik`

, `residuals`

,
`predict`

, `fitted`

, `terms`

,
`model.matrix`

. See `predict.zeroinfl`

for more details
on all methods.

Cameron, A. Colin and Pravin K. Trevedi. 1998. *Regression Analysis of Count
Data.* New York: Cambridge University Press.

Cameron, A. Colin and Pravin K. Trivedi. 2005. *Microeconometrics: Methods and Applications*.
Cambridge: Cambridge University Press.

Lambert, Diane. 1992. “Zero-Inflated Poisson Regression,
with an Application to Defects in Manufacturing.” *Technometrics*. **34**(1):1-14

Zeileis, Achim, Christian Kleiber and Simon Jackman 2008.
“Regression Models for Count Data in R.”
*Journal of Statistical Software*, **27**(8).
URL http://www.jstatsoft.org/v27/i08/.

```
## data
data("bioChemists", package = "pscl")
## without inflation
## ("art ~ ." is "art ~ fem + mar + kid5 + phd + ment")
fm_pois <- glm(art ~ ., data = bioChemists, family = poisson)
fm_qpois <- glm(art ~ ., data = bioChemists, family = quasipoisson)
fm_nb <- MASS::glm.nb(art ~ ., data = bioChemists)
## with simple inflation (no regressors for zero component)
fm_zip <- zeroinfl(art ~ . | 1, data = bioChemists)
fm_zinb <- zeroinfl(art ~ . | 1, data = bioChemists, dist = "negbin")
## inflation with regressors
## ("art ~ . | ." is "art ~ fem + mar + kid5 + phd + ment | fem + mar + kid5 + phd + ment")
fm_zip2 <- zeroinfl(art ~ . | ., data = bioChemists)
fm_zinb2 <- zeroinfl(art ~ . | ., data = bioChemists, dist = "negbin")
```

Run the code above in your browser using DataLab