Allows to fit a zero-inflated (Poisson or negative binomial) regression model to deal with zero-excess in count data.
zeroinf(
formula,
data,
offset,
subset,
na.action = na.omit(),
weights,
family = "poi(log)",
zero.link = c("logit", "probit", "cloglog", "cauchit", "log"),
reltol = 1e-13,
start = list(counts = NULL, zeros = NULL),
...
)
An object of class zeroinflation in which the main results of the model fitted to the data are stored, i.e., a list with components including
coefficients | a list with elements "counts" and "zeros" containing the parameter estimates |
from the respective models, | |
fitted.values | a list with elements "counts" and "zeros" containing the estimates of \(\mu_1,\ldots,\mu_n\) |
and \(\pi_1,\ldots,\pi_n\), respectively, | |
start | a vector containing the starting values for all parameters in the model, |
prior.weights | a vector containing the case weights used, |
offset | a list with elements "counts" and "zeros" containing the offset vectors, if any, |
from the respective models, | |
terms | a list with elements "counts", "zeros" and "full" containing the terms objects for |
the respective models, | |
loglik | the value of the log-likelihood function avaliated at the parameter estimates and |
the observed data, | |
estfun | a list with elements "counts" and "zeros" containing the estimating functions |
evaluated at the parameter estimates and the observed data for the respective models, | |
formula | the formula, |
levels | the levels of the categorical regressors, |
contrasts | a list with elements "counts" and "zeros" containing the contrasts corresponding |
to levels from the respective models, | |
converged | a logical indicating successful convergence, |
model | the full model frame, |
y | the response count vector, |
family | a list with elements "counts" and "zeros" containing the family objects used |
in the respective models, | |
linear.predictors | a list with elements "counts" and "zeros" containing the estimates of |
\(g(\mu_1),\ldots,g(\mu_n)\) and \(h(\pi_1),\ldots,h(\pi_n)\), respectively, | |
R | a matrix with the Cholesky decomposition of the inverse of the variance-covariance |
matrix of all parameters in the model, | |
call | the original function call. |
a Formula
expression of the form response ~ x1 + x2 + ... | z1 + z2 + ...
, which is a symbolic description
of the linear predictors of the models to be fitted to \(\mu\) and \(\pi\), respectively. See Formula documentation. If a
formula of the form response ~ x1 + x2 + ...
is supplied, then the same regressors are employed in both components. This is equivalent to
response ~ x1 + x2 + ...| x1 + x2 + ...
.
an (optional) data frame
in which to look for variables involved in the formula
expression,
as well as for variables specified in the arguments weights
and subset
.
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL
or a numeric vector of length equal to the number of cases.
an (optional) vector specifying a subset of observations to be used in the fitting process.
a function which indicates what should happen when the data contain NAs. By default na.action
is set to na.omit()
.
an (optional) vector of positive "prior weights" to be used in the fitting process. The length of
weights
should be the same as the number of observations. As default, weights
is set to a vector of 1s.
an (optional) character string that allows you to specify the distribution
to describe the response variable, as well as the link function to be used in
the model for \(\mu\). The following distributions are supported:
(zero-inflated) negative binomial I ("nb1"), (zero-inflated) negative binomial II
("nb2"), (zero-inflated) negative binomial ("nbf"), and (zero-inflated) poisson
("poi"). Link functions are the same as those available in Poisson models via
glm. See family documentation. As default, family
is set to
be Poisson with log link.
an (optional) character string which allows to specify the link function to be used in the model for \(\pi\).
Link functions available are the same than those available in binomial models via glm. See family documentation.
As default, zero.link
is set to "logit".
an (optional) positive value which represents the relative convergence tolerance for the BFGS method in optim.
As default, reltol
is set to 1e-13.
an (optional) list with two components named "counts" and "zeros", which allows to specify the starting values to be used in the iterative process to obtain the estimates of the parameters in the linear predictors to the models for \(\mu\) and \(\pi\), respectively.
further arguments passed to or from other methods.
The zero-inflated count distributions may be obtained as the mixture between a count distribution and the Bernoulli distribution. Indeed, if \(Y\) is a count random variable such that \(Y|\nu=1\) is 0 with probability 1 and \(Y|\nu=0\) ~ Poisson\((\mu)\), where \(\nu\) ~ Bernoulli\((\pi)\), then \(Y\) is distributed according to the Zero-Inflated Poisson distribution, denoted here as ZIP\((\mu,\pi)\).
Similarly, if \(Y\) is a count random variable such that \(Y|\nu=1\) is 0 with probability 1 and \(Y|\nu=0\) ~ NB\((\mu,\phi,\tau)\), where \(\nu\) ~ Bernoulli\((\pi)\), then \(Y\) is distributed according to the Zero-Inflated Negative Binomial distribution, denoted here as ZINB\((\mu,\phi,\tau,\pi)\). The Zero-Inflated Negative Binomial I \((\mu,\phi,\pi)\) and Zero-Inflated Negative Binomial II \((\mu,\phi,\pi)\) distributions are special cases of ZINB when \(\tau=0\) and \(\tau=-1\), respectively.
The "counts" model may be expressed as \(g(\mu_i)=x_i^{\top}\beta\) for \(i=1,\ldots,n\), where
\(g(\cdot)\) is the link function specified at the argument family
. Similarly, the "zeros" model may
be expressed as \(h(\pi_i)=z_i^{\top}\gamma\) for \(i=1,\ldots,n\), where \(h(\cdot)\) is the
link function specified at the argument zero.link
. Parameter estimation is
performed using the maximum likelihood method. The model parameters are estimated by
maximizing the log-likelihood function through the BFGS method available in the routine
optim. Analytical derivatives are used instead of numerical derivatives to
increase BFGS method accuracy and speed. The variance-covariance matrix estimate is
obtained as being minus the inverse of the (analytical) hessian matrix evaluated at the
parameter estimates and the observed data.
A set of standard extractor functions for fitted model objects is available for objects of class zeroinflation, including methods for generic functions such as print, summary, model.matrix, estequa, coef, vcov, logLik, fitted, confint, AIC, BIC and predict. In addition, the model fitted to the data may be assessed using functions such as anova.zeroinflation, residuals.zeroinflation, dfbeta.zeroinflation, cooks.distance.zeroinflation and envelope.zeroinflation.
Cameron A.C., Trivedi P.K. 1998. Regression Analysis of Count Data. New York: Cambridge University Press.
Lambert D. 1992. Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing. Technometrics 34, 1-14.
Garay A.M., Hashimoto E.M., Ortega E.M.M., Lachos V. (2011) On estimation and influence diagnostics for zero-inflated negative binomial regression models. Computational Statistics & Data Analysis 55, 1304-1318.
overglm, zeroalt
####### Example 1: Roots Produced by the Columnar Apple Cultivar Trajan
data(Trajan)
fit1 <- zeroinf(roots ~ photoperiod, family="nbf(log)", zero.link="logit", data=Trajan)
summary(fit1)
####### Example 2: Self diagnozed ear infections in swimmers
data(swimmers)
fit2 <- zeroinf(infections ~ frequency | location, family="nb1(log)", data=swimmers)
summary(fit2)
####### Example 3: Article production by graduate students in biochemistry PhD programs
bioChemists <- pscl::bioChemists
fit3 <- zeroinf(art ~ fem + kid5 + ment | ment, family="nb1(log)", data = bioChemists)
summary(fit3)
Run the code above in your browser using DataLab