Learn R Programming

VGAM (version 1.1-14)

negbinomial: Negative Binomial Distribution Family Function

Description

Maximum likelihood estimation of the two parameters of a negative binomial distribution.

Usage

negbinomial(zero = "size", parallel = FALSE, deviance.arg = FALSE,
            type.fitted = c("mean", "quantiles"),
            percentiles = c(25, 50, 75), vfl = FALSE,
            mds.min = 1e-3, nsimEIM = 500, cutoff.prob = 0.999,
            eps.trig = 1e-7, max.support = 4000, max.chunk.MB = 30,
            lmu = "loglink", lsize = "loglink",
            imethod = 1, imu = NULL, iprobs.y = NULL,
            gprobs.y = ppoints(6), isize = NULL,
            gsize.mux = exp(c(-30, -20, -15, -10, -6:3)))
polya(zero = "size", type.fitted = c("mean", "prob"),
      mds.min = 1e-3, nsimEIM = 500, cutoff.prob = 0.999,
      eps.trig = 1e-7, max.support = 4000, max.chunk.MB = 30,
      lprob = "logitlink", lsize = "loglink", imethod = 1, iprob = NULL,
      iprobs.y = NULL, gprobs.y = ppoints(6), isize = NULL,
      gsize.mux = exp(c(-30, -20, -15, -10, -6:3)), imunb = NULL)
polyaR(zero = "size", type.fitted = c("mean", "prob"),
       mds.min = 1e-3, nsimEIM = 500,  cutoff.prob = 0.999,
       eps.trig = 1e-7, max.support = 4000, max.chunk.MB = 30,
       lsize = "loglink", lprob = "logitlink", imethod = 1, iprob = NULL,
       iprobs.y = NULL, gprobs.y = ppoints(6), isize = NULL,
       gsize.mux = exp(c(-30, -20, -15, -10, -6:3)), imunb = NULL)

Arguments

Value

An object of class "vglmff"

(see vglmff-class). The object is used by modelling functions such as vglm,

rrvglm

and vgam.

Details

The negative binomial distribution (NBD) can be motivated in several ways, e.g., as a Poisson distribution with a mean that is gamma distributed. There are several common parametrizations of the NBD. The one used by negbinomial() uses the mean \(\mu\) and an index parameter \(k\), both which are positive. Specifically, the density of a random variable \(Y\) is $$f(y;\mu,k) = {y + k - 1 \choose y} \, \left( \frac{\mu}{\mu+k} \right)^y\, \left( \frac{k}{k+\mu} \right)^k $$ where \(y=0,1,2,\ldots\), and \(\mu > 0\) and \(k > 0\). Note that the dispersion parameter is \(1/k\), so that as \(k\) approaches infinity the NBD approaches a Poisson distribution. The response has variance \(Var(Y)=\mu+\mu^2/k\). When fitted, the fitted.values slot of the object contains the estimated value of the \(\mu\) parameter, i.e., of the mean \(E(Y)\). It is common for some to use \(\alpha=1/k\) as the ancillary or heterogeneity parameter; so common alternatives for lsize are negloglink and reciprocallink.

For polya the density is $$f(y;p,k) = {y + k - 1 \choose y} \, \left( 1 - p \right)^y\, p^k $$ where \(y=0,1,2,\ldots\), and \(k > 0\) and \(0 < p < 1\).

Family function polyaR() is the same as polya() except the order of the two parameters are switched. The reason is that polyaR() tries to match with rnbinom closely in terms of the argument order, etc. Should the probability parameter be of primary interest, probably, users will prefer using polya() rather than polyaR(). Possibly polyaR() will be decommissioned one day.

The NBD can be coerced into the classical GLM framework with one of the parameters being of interest and the other treated as a nuisance/scale parameter (this is implemented in the MASS library). The VGAM family function negbinomial() treats both parameters on the same footing, and estimates them both by full maximum likelihood estimation.

The parameters \(\mu\) and \(k\) are independent (diagonal EIM), and the confidence region for \(k\) is extremely skewed so that its standard error is often of no practical use. The parameter \(1/k\) has been used as a measure of aggregation. For the NB-C the EIM is not diagonal.

These VGAM family functions handle multiple responses, so that a response matrix can be inputted. The number of columns is the number of species, say, and setting zero = -2 means that all species have a \(k\) equalling a (different) intercept only.

Conlisk, et al. (2007) show that fitting the NBD to presence-absence data will result in identifiability problems. However, the model is identifiable if the response values include 0, 1 and 2.

For the NB canonical link (NB-C), its estimation has a somewhat interesting history. Some details are at nbcanlink.

References

Bliss, C. and Fisher, R. A. (1953). Fitting the negative binomial distribution to biological data. Biometrics 9, 174--200.

Conlisk, E. and Conlisk, J. and Harte, J. (2007). The impossibility of estimating a negative binomial clustering parameter from presence-absence data: A comment on He and Gaston. The American Naturalist 170, 651--654.

Evans, D. A. (1953). Experimental evidence concerning contagious distributions in ecology. Biometrika, 40(1--2), 186--211.

Hilbe, J. M. (2011). Negative Binomial Regression, 2nd Edition. Cambridge: Cambridge University Press.

Lawless, J. F. (1987). Negative binomial and mixed Poisson regression. The Canadian Journal of Statistics 15, 209--225.

Miranda-Soberanis, V. F. and Yee, T. W. (2023). Two-parameter link functions, with applications to negative binomial, Weibull and quantile regression. Computational Statistics, 38, 1463--1485.

Yee, T. W. (2014). Reduced-rank vector generalized linear models with two linear predictors. Computational Statistics and Data Analysis, 71, 889--902.

Yee, T. W. (2020). The VGAM package for negative binomial regression. Australian & New Zealand Journal of Statistics, 62, 116--131.

See Also

quasipoisson, gaitdnbinomial, poissonff, zinegbinomial, negbinomial.size (e.g., NB-G), nbcanlink (NB-C), posnegbinomial, genpoisson1, genpoisson2, genpoisson0, inv.binomial, NegBinomial, rrvglm, cao, cqo, CommonVGAMffArguments, simulate.vlm, ppoints, margeff.

Examples

Run this code
if (FALSE) {
# Example 1: apple tree data (Bliss and Fisher, 1953)
appletree <- data.frame(y = 0:7, w = c(70, 38, 17, 10, 9, 3, 2, 1))
fit <- vglm(y ~ 1, negbinomial(deviance = TRUE), data = appletree,
            weights = w, crit = "coef")  # Obtain the deviance
fit <- vglm(y ~ 1, negbinomial(deviance = TRUE), data = appletree,
            weights = w, half.step = FALSE)  # Alternative method
summary(fit)
coef(fit, matrix = TRUE)
Coef(fit)  # For intercept-only models
deviance(fit)  # NB2 only; needs 'crit="coef"' & 'deviance=T' above

# Example 2: simulated data with multiple responses
ndata <- data.frame(x2 = runif(nn <- 200))
ndata <- transform(ndata, y1 = rnbinom(nn, exp(1), mu = exp(3+x2)),
                          y2 = rnbinom(nn, exp(0), mu = exp(2-x2)))
fit1 <- vglm(cbind(y1, y2) ~ x2, negbinomial, ndata, trace = TRUE)
coef(fit1, matrix = TRUE)

# Example 3: large counts implies SFS is used
ndata <- transform(ndata, y3 = rnbinom(nn, exp(1), mu = exp(10+x2)))
with(ndata, range(y3))  # Large counts
fit2 <- vglm(y3 ~ x2, negbinomial, data = ndata, trace = TRUE)
coef(fit2, matrix = TRUE)
head(weights(fit2, type = "working"))  # Non-empty; SFS was used

# Example 4: a NB-1 to estimate a NB with Var(Y)=phi0*mu
nn <- 200  # Number of observations
phi0 <- 10  # Specify this; should be greater than unity
delta0 <- 1 / (phi0 - 1)
mydata <- data.frame(x2 = runif(nn), x3 = runif(nn))
mydata <- transform(mydata, mu = exp(2 + 3 * x2 + 0 * x3))
mydata <- transform(mydata, y3 = rnbinom(nn, delta0 * mu, mu = mu))
plot(y3 ~ x2, data = mydata, pch = "+", col = "blue",
     main = paste("Var(Y) = ", phi0, " * mu", sep = ""), las = 1)
nb1 <- vglm(y3 ~ x2 + x3, negbinomial(parallel = TRUE, zero = NULL),
            data = mydata, trace = TRUE)
# Extracting out some quantities:
cnb1 <- coef(nb1, matrix = TRUE)
mydiff <- (cnb1["(Intercept)", "loglink(size)"] -
           cnb1["(Intercept)", "loglink(mu)"])
delta0.hat <- exp(mydiff)
(phi.hat <- 1 + 1 / delta0.hat)  # MLE of phi
summary(nb1)
# Obtain a 95 percent confidence interval for phi0:
myvec <- rbind(-1, 1, 0, 0)
(se.mydiff <- sqrt(t(myvec) %*%  vcov(nb1) %*%  myvec))
ci.mydiff <- mydiff + c(-1.96, 1.96) * c(se.mydiff)
ci.delta0 <- ci.exp.mydiff <- exp(ci.mydiff)
(ci.phi0 <- 1 + 1 / rev(ci.delta0))  # The 95
Confint.nb1(nb1)  # Quick way to get it
# cf. moment estimator:
summary(glm(y3 ~ x2 + x3, quasipoisson, mydata))$disper
}

Run the code above in your browser using DataLab