overglm: Alternatives to the Poisson and Binomial Regression Models under the presence of Overdispersion.

Description

Allows to fit regression models based on the negative binomial, beta-binomial, and random-clumped binomial. distributions, which are alternatives to the Poisson and binomial regression models under the presence of overdispersion.

Usage

overglm(
  formula,
  offset,
  family = "nb1(log)",
  weights,
  data,
  subset,
  na.action = na.omit(),
  reltol = 1e-13,
  start = NULL,
  ...
)

Value

an object of class overglm in which the main results of the model fitted to the data are stored, i.e., a list with components including

`coefficients`	a vector containing the parameter estimates,

`fitted.values`	a vector containing the estimates of \(\mu_1,\ldots,\mu_n\),

`start`	a vector containing the starting values used,

`prior.weights`	a vector containing the case weights used,

`offset`	a vector containing the offset used,

`terms`	an object containing the terms objects,

`loglik`	the value of the log-likelihood function avaliated at the parameter estimates,

`estfun`	a vector containing the estimating functions evaluated at the parameter estimates
	and the observed data,

`formula`	the formula,

`levels`	the levels of the categorical regressors,

`contrasts`	an object containing the contrasts corresponding to levels,

`converged`	a logical indicating successful convergence,

`model`	the full model frame,

`y`	the response count vector,

`family`	an object containing the family object used,

`linear.predictors`	a vector containing the estimates of \(g(\mu_1),\ldots,g(\mu_n)\),

`R`	a matrix with the Cholesky decomposition of the inverse of the variance-covariance
	matrix of all parameters in the model,

`call`	the original function call.

Arguments

formula: a formula expression of the form response ~ x1 + x2 + ..., which is a symbolic description of the linear predictor of the model to be fitted to the data.
offset: this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases.
family: A character string that allows you to specify the distribution describing the response variable. In addition, it allows you to specify the link function to be used in the model for \(\mu\). The following distributions are supported: negative binomial I ("nb1"), negative binomial II ("nb2"), negative binomial ("nbf"), zero-truncated negative binomial I ("ztnb1"), zero-truncated negative binomial II ("ztnb2"), zero-truncated negative binomial ("ztnbf"), zero-truncated poisson ("ztpoi"), beta-binomial ("bb") and random-clumped binomial ("rcb"). Link functions available for these models are the same as those available for Poisson and binomial models via glm. See family documentation.
weights: an (optional) vector of positive "prior weights" to be used in the fitting process. The length of weights should be the same as the number of observations.
data: an (optional) data frame in which to look for variables involved in the formula expression, as well as for variables specified in the arguments weights and subset.
subset: an (optional) vector specifying a subset of individuals to be used in the fitting process.
na.action: a function which indicates what should happen when the data contain NAs. By default na.action is set to na.omit().
reltol: an (optional) positive value which represents the relative convergence tolerance for the BFGS method in optim. As default, reltol is set to 1e-13.
start: an (optional) vector of starting values for the parameters in the linear predictor.
...: further arguments passed to or from other methods.

Details

The negative binomial distribution can be obtained as mixture of the Poisson and Gamma distributions. If \(Y | \lambda\) ~ Poisson\((\lambda)\), where E\((Y | \lambda)=\) Var\((Y | \lambda)=\lambda\), and \(\lambda\) ~ Gamma\((\theta,\nu)\), in which E\((\lambda)=\theta\) and Var\((\lambda)=\nu\theta^2\), then \(Y\) is distributed according to the negative binomial distribution. As follows, some special cases are described:

(1) If \(\theta=\mu\) and \(\nu=\phi\) then \(Y\) ~ Negative Binomial I, E\((Y)=\mu\) and Var\((Y)=\mu(1 + \phi\mu)\).

(2) If \(\theta=\mu\) and \(\nu=\phi/\mu\) then \(Y\) ~ Negative Binomial II, E\((Y)=\mu\) and Var\((Y)=\mu(1 +\phi)\).

(3) If \(\theta=\mu\) and \(\nu=\phi\mu^\tau\) then \(Y\) ~ Negative Binomial, E\((Y)=\mu\) and Var\((Y)=\mu(1 +\phi\mu^{\tau+1})\).

Therefore, the regression models based on the negative binomial and zero-truncated negative binomial distributions are alternatives, under overdispersion, to those based on the Poisson and zero-truncated Poisson distributions, respectively.

The beta-binomial distribution can be obtained as a mixture of the binomial and beta distributions. If \(mY | \pi\) ~ Binomial\((m,\pi)\), where E\((Y | \pi)=\pi\) and Var\((Y | \pi)=m^{-1}\pi(1-\pi)\), and \(\pi\) ~ Beta\((\mu,\phi)\), in which E\((\pi)=\mu\) and Var\((\pi)=(\phi+1)^{-1}\mu(1-\mu)\), with \(\phi>0\), then \(mY\) ~ Beta-Binomial\((m,\mu,\phi)\), so that E\((Y)=\mu\) and Var\((Y)=m^{-1}\mu(1-\mu)[1 + (\phi+1)^{-1}(m-1)]\). Therefore, the regression model based on the beta-binomial distribution is an alternative, under overdispersion, to the binomial regression model.

The random-clumped binomial distribution can be obtained as a mixture of the binomial and Bernoulli distributions. If \(mY | \pi\) ~ Binomial\((m,\pi)\), where E\((Y | \pi)=\pi\) and Var\((Y | \pi)=m^{-1}\pi(1-\pi)\), whereas \(\pi=(1-\phi)\mu + \phi\) with probability \(\mu\), and \(\pi=(1-\phi)\mu\) with probability \(1-\mu\), in which E\((\pi)=\mu\) and Var\((\pi)=\phi^{2}\mu(1-\mu)\), with \(\phi \in (0,1)\), then \(mY\) ~ Random-clumped Binomial\((m,\mu,\phi)\), so that E\((Y)=\mu\) and Var\((Y)=m^{-1}\mu(1-\mu)[1 + \phi^{2}(m-1)]\). Therefore, the regression model based on the random-clumped binomial distribution is an alternative, under overdispersion, to the binomial regression model.

In all cases, even where the response variable is described by a zero-truncated distribution, the fitted model describes the way in which \(\mu\) is dependent on some covariates. Parameter estimation is performed using the maximum likelihood method. The model parameters are estimated by maximizing the log-likelihood function through the BFGS method available in the routine optim. The accuracy and speed of the BFGS method are increased because the call to the routine optim is performed using analytical instead of the numerical derivatives. The variance-covariance matrix estimate is obtained as being minus the inverse of the (analytical) hessian matrix evaluated at the parameter estimates and the observed data.

A set of standard extractor functions for fitted model objects is available for objects of class zeroinflation, including methods to the generic functions such as print, summary, model.matrix, estequa, coef, vcov, logLik, fitted, confint, AIC, BIC and predict. In addition, the model fitted to the data may be assessed using functions such as anova.overglm, residuals.overglm, dfbeta.overglm, cooks.distance.overglm, localInfluence.overglm, gvif.overglm and envelope.overglm. The variable selection may be accomplished using the routine stepCriterion.overglm.

References

Crowder M. (1978) Beta-binomial anova for proportions, Journal of the Royal Statistical Society Series C (Applied Statistics) 27, 34-37.

Lawless J.F. (1987) Negative binomial and mixed poisson regression, The Canadian Journal of Statistics 15, 209-225.

Morel J.G., Neerchal N.K. (1997) Clustered binary logistic regression in teratology data using a finite mixture distribution, Statistics in Medicine 16, 2843-2853.

Morel J.G., Nagaraj N.K. (2012) Overdispersion Models in SAS. SAS Institute Inc., Cary, North Carolina, USA.

Examples

Run this code

### Example 1: Ability of retinyl acetate to prevent mammary cancer in rats
data(mammary)
fit1 <- overglm(tumors ~ group, family="nb1(identity)", data=mammary)
summary(fit1)

### Example 2: Self diagnozed ear infections in swimmers
data(swimmers)
fit2 <- overglm(infections ~ frequency + location, family="nb1(log)", data=swimmers)
summary(fit2)

### Example 3: Urinary tract infections in HIV-infected men
data(uti)
fit3 <- overglm(episodes ~ cd4 + offset(log(time)), family="nb1(log)", data = uti)
summary(fit3)

### Example 4: Article production by graduate students in biochemistry PhD programs
bioChemists <- pscl::bioChemists
fit4 <- overglm(art ~ fem + kid5 + ment, family="nb1(log)", data = bioChemists)
summary(fit4)

### Example 5: Agents to stimulate cellular differentiation
data(cellular)
fit5 <- overglm(cbind(cells,200-cells) ~ tnf + ifn, family="bb(logit)", data=cellular)
summary(fit5)

### Example 6: Teratogenic effects of phenytoin and trichloropropene oxide
data(ossification)
model6 <- cbind(fetuses,litter-fetuses) ~ pht + tcpo
fit6 <- overglm(model6, family="rcb(cloglog)", data=ossification)
summary(fit6)

### Example 7: Germination of orobanche seeds
data(orobanche)
model7 <- cbind(germinated,seeds-germinated) ~ specie + extract
fit7 <- overglm(model7, family="rcb(cloglog)", data=orobanche)
summary(fit7)

Run the code above in your browser using DataLab