# negbinomial

##### Negative Binomial Distribution Family Function

Maximum likelihood estimation of the two parameters of a negative binomial distribution.

- Keywords
- models, regression

##### Usage

```
negbinomial(zero = "size", parallel = FALSE, deviance.arg = FALSE,
type.fitted = c("mean", "quantiles"),
percentiles = c(25, 50, 75),
mds.min = 1e-3, nsimEIM = 500, cutoff.prob = 0.999,
eps.trig = 1e-7, max.support = 4000, max.chunk.MB = 30,
lmu = "loge", lsize = "loge",
imethod = 1, imu = NULL, iprobs.y = NULL,
gprobs.y = ppoints(6), isize = NULL,
gsize.mux = exp(c(-30, -20, -15, -10, -6:3)))
polya(zero = "size", type.fitted = c("mean", "prob"),
mds.min = 1e-3, nsimEIM = 500, cutoff.prob = 0.999,
eps.trig = 1e-7, max.support = 4000, max.chunk.MB = 30,
lprob = "logit", lsize = "loge", imethod = 1, iprob = NULL,
iprobs.y = NULL, gprobs.y = ppoints(6), isize = NULL,
gsize.mux = exp(c(-30, -20, -15, -10, -6:3)), imunb = NULL)
polyaR(zero = "size", type.fitted = c("mean", "prob"),
mds.min = 1e-3, nsimEIM = 500, cutoff.prob = 0.999,
eps.trig = 1e-7, max.support = 4000, max.chunk.MB = 30,
lsize = "loge", lprob = "logit", imethod = 1, iprob = NULL,
iprobs.y = NULL, gprobs.y = ppoints(6), isize = NULL,
gsize.mux = exp(c(-30, -20, -15, -10, -6:3)), imunb = NULL)
```

##### Arguments

- lmu, lsize, lprob
Link functions applied to the \(\mu\), \(k\) and \(p\) parameters. See

`Links`

for more choices. Note that the \(\mu\), \(k\) and \(p\) parameters are the`mu`

,`size`

and`prob`

arguments of`rnbinom`

respectively. Common alternatives for`lsize`

are`negloge`

and`reciprocal`

, and`loglog`

(if \(k > 1\)).- imu, imunb, isize, iprob
Optional initial values for the mean and \(k\) and \(p\). For \(k\), if failure to converge occurs then try different values (and/or use

`imethod`

). For a \(S\)-column response,`isize`

can be of length \(S\). A value`NULL`

means an initial value for each response is computed internally using a gridsearch based on`gsize.mux`

. The last argument is ignored if used within`cqo`

; see the`iKvector`

argument of`qrrvglm.control`

instead. In the future`isize`

and`iprob`

might be depreciated.- nsimEIM
This argument is used for computing the diagonal element of the

*expected information matrix*(EIM) corresponding to \(k\) based on the*simulated Fisher scoring*(SFS) algorithm. See`CommonVGAMffArguments`

for more information and the notes below. SFS is one of two algorithms for computing the EIM elements (so that both algorithms may be used on a given data set). SFS is faster than the exact method when`Qmax`

is large.- cutoff.prob
Fed into the

`p`

argument of`qnbinom`

in order to obtain an upper limit for the approximate support of the distribution, called`Qmax`

, say. Similarly, the value`1-p`

is fed into the`p`

argument of`qnbinom`

in order to obtain a lower limit for the approximate support of the distribution, called`Qmin`

, say. Hence the approximate support is`Qmin:Qmax`

. This argument should be a numeric and close to 1 but never exactly 1. Used to specify how many terms of the infinite series for computing the second diagonal element of the EIM are actually used. The closer this argument is to 1, the more accurate the standard errors of the regression coefficients will be. If this argument is too small, convergence will take longer.- max.chunk.MB, max.support
`max.support`

is used to describe the eligibility of individual observations to have their EIM computed by the*exact method*. Here, we are concerned about computing the EIM wrt \(k\). The exact method algorithm operates separately on each response variable, and it constructs a large matrix provided that the number of columns is less than`max.support`

. If so, then the computations are done in chunks, so that no more than about`max.chunk.MB`

megabytes of memory is used at a time (actually, it is proportional to this amount). Regarding eligibility of this algorithm, each observation must have the length of the vector, starting from the`1-cutoff.prob`

quantile and finishing up at the`cutoff.prob`

quantile, less than`max.support`

(as its approximate support). If you have abundant memory then you might try setting`max.chunk.MB = Inf`

, but then the computations might take a very long time. Setting`max.chunk.MB = 0`

or`max.support = 0`

will force the EIM to be computed using the SFS algorithm only (this*used to be*the default method for*all*the observations). When the fitted values of the model are large and \(k\) is small, the computation of the EIM will be costly with respect to time and memory if the exact method is used. Hence the argument`max.support`

limits the cost in terms of time. For intercept-only models`max.support`

is multiplied by a number (such as 10) because only one inner product needs be computed. Note:`max.support`

is an upper bound and limits the number of terms dictated by the`eps.trig`

argument.- mds.min
Numeric. Minimum value of the NBD mean divided by

`size`

parameter. The closer this ratio is to 0, the closer the distribution is to a Poisson. Iterations will stop when an estimate of \(k\) is so large, relative to the mean, than it is below this threshold (this is treated as a boundary of the parameter space).- eps.trig
Numeric. A small positive value used in the computation of the EIMs. It focusses on the denominator of the terms of a series. Each term in the series (that is used to approximate an infinite series) has a value greater than

`size / sqrt(eps.trig)`

, thus very small terms are ignored. It's a good idea to set a smaller value that will result in more accuracy, but it will require a greater computing time (when \(k\) is close to 0). And adjustment to`max.support`

may be needed. In particular, the quantity computed by special means is \(\psi'(k) - E[\psi'(Y+k)]\), which is the difference between two`trigamma`

. functions. It is part of the calculation of the EIM with respect to the`size`

parameter.- gsize.mux
Similar to

`gsigma`

in`CommonVGAMffArguments`

. However, this grid is multiplied by the initial estimates of the NBD mean parameter. That is, it is on a relative scale rather than on an absolute scale. If the counts are very large in value then convergence fail might occur; if so, then try a smaller value such as`gsize.mux = exp(-40)`

.- type.fitted, percentiles
See

`CommonVGAMffArguments`

for more information.- deviance.arg
Logical. If

`TRUE`

, the deviance is computed*after*convergence. It only works in the NB-2 model. It is also necessary to set`criterion = "coefficients"`

or`half.step = FALSE`

since one cannot use that criterion properly for the minimization within the IRLS algorithm. It should be set`TRUE`

when used with`cqo`

under the fast algorithm.- imethod
An integer with value

`1`

or`2`

etc. which specifies the initialization method for the \(\mu\) parameter. If failure to converge occurs try another value and/or else specify a value for`iprobs.y`

and/or else specify a value for`isize`

.- parallel
See

`CommonVGAMffArguments`

for more information. Setting`parallel = TRUE`

is useful in order to get something similar to`quasipoissonff`

or what is known as NB-1. If`parallel = TRUE`

then the parallelism constraint does not apply to any intercept term. You should set`zero = NULL`

too if`parallel = TRUE`

to avoid a conflict.- gprobs.y
A vector representing a grid; passed into the

`probs`

argument of`quantile`

when`imethod = 1`

to obtain an initial value for the mean of each response. Is overwritten by any value of`iprobs.y`

.- iprobs.y
Passed into the

`probs`

argument of`quantile`

when`imethod = 1`

to obtain an initial value for the mean of each response. Overwrites any value of`gprobs.y`

. This argument might be deleted in the future.- zero
Can be an integer-valued vector, and if so, then it is usually assigned \(-2\) or \(2\). Specifies which of the two linear/additive predictors are modelled as an intercept only. By default, the \(k\) parameter (after

`lsize`

is applied) is modelled as a single unknown number that is estimated. It can be modelled as a function of the explanatory variables by setting`zero = NULL`

; this has been called a NB-H model by Hilbe (2011). A negative value means that the value is recycled, so setting \(-2\) means all \(k\) are intercept-only. See`CommonVGAMffArguments`

for more information.

##### Details

The negative binomial distribution (NBD)
can be motivated in several ways,
e.g., as a Poisson distribution with a mean that is gamma
distributed.
There are several common parametrizations of the NBD.
The one used by `negbinomial()`

uses the
mean \(\mu\) and an *index* parameter
\(k\), both which are positive.
Specifically, the density of a random variable \(Y\) is
$$f(y;\mu,k) = {y + k - 1 \choose y} \,
\left( \frac{\mu}{\mu+k} \right)^y\,
\left( \frac{k}{k+\mu} \right)^k $$
where \(y=0,1,2,\ldots\),
and \(\mu > 0\) and \(k > 0\).
Note that the *dispersion* parameter is
\(1/k\), so that as \(k\) approaches infinity the
NBD approaches a Poisson distribution.
The response has variance \(Var(Y)=\mu+\mu^2/k\).
When fitted, the `fitted.values`

slot of the object contains
the estimated value of the \(\mu\) parameter, i.e., of the mean
\(E(Y)\).
It is common for some to use \(\alpha=1/k\) as the
ancillary or heterogeneity parameter;
so common alternatives for `lsize`

are
`negloge`

and
`reciprocal`

.

For `polya`

the density is
$$f(y;p,k) = {y + k - 1 \choose y} \,
\left( 1 - p \right)^y\,
p^k $$
where \(y=0,1,2,\ldots\),
and \(k > 0\) and \(0 < p < 1\).

Family function `polyaR()`

is the same as `polya()`

except
the order of the two parameters are switched.
The reason is that `polyaR()`

tries to match with
`rnbinom`

closely
in terms of the argument order, etc.
Should the probability parameter be of primary interest,
probably, users will prefer using `polya()`

rather than
`polyaR()`

.
Possibly `polyaR()`

will be decommissioned one day.

The NBD can be coerced into the
classical GLM framework with one of the parameters being
of interest and the other treated as a nuisance/scale
parameter (this is implemented in the MASS library). The
VGAM family function `negbinomial()`

treats both
parameters on the same footing, and estimates them both
by full maximum likelihood estimation.

The parameters \(\mu\) and \(k\) are independent (diagonal EIM), and the confidence region for \(k\) is extremely skewed so that its standard error is often of no practical use. The parameter \(1/k\) has been used as a measure of aggregation. For the NB-C the EIM is not diagonal.

These VGAM family functions handle
*multiple* responses, so that a response matrix can be
inputted. The number of columns is the number
of species, say, and setting `zero = -2`

means that
*all* species have a \(k\) equalling a (different)
intercept only.

##### Value

An object of class `"vglmff"`

(see `vglmff-class`

).
The object is used by modelling functions such as `vglm`

,
`rrvglm`

and `vgam`

.

##### Note

These 3 functions implement 2 common parameterizations
of the negative binomial (NB). Some people called the
NB with integer \(k\) the *Pascal* distribution,
whereas if \(k\) is real then this is the *Polya*
distribution. I don't. The one matching the details of
`rnbinom`

in terms of \(p\)
and \(k\) is `polya()`

.

For `polya()`

the code may fail when \(p\) is close
to 0 or 1. It is not yet compatible with `cqo`

or `cao`

.

Suppose the response is called `ymat`

.
For `negbinomial()`

the diagonal element of the *expected information matrix*
(EIM) for parameter \(k\)
involves an infinite series; consequently SFS
(see `nsimEIM`

) is used as the backup algorithm only.
SFS should be better if `max(ymat)`

is large,
e.g., `max(ymat) > 1000`

,
or if there are any outliers in `ymat`

.
The default algorithm involves a finite series approximation
to the support `0:Inf`

;
the arguments
`max.memory`

,
`min.size`

and
`cutoff.prob`

are pertinent.

Regardless of the algorithm used,
convergence problems may occur, especially when the response has large
outliers or is large in magnitude.
If convergence failure occurs, try using arguments
(in recommended decreasing order)
`max.support`

,
`nsimEIM`

,
`cutoff.prob`

,
`iprobs.y`

,
`imethod`

,
`isize`

,
`zero`

,
`max.chunk.MB`

.

The function `negbinomial`

can be used by the fast algorithm in
`cqo`

, however, setting `eq.tolerances = TRUE`

and
`I.tolerances = FALSE`

is recommended.

In the first example below (Bliss and Fisher, 1953), from each of 6 McIntosh apple trees in an orchard that had been sprayed, 25 leaves were randomly selected. On each of the leaves, the number of adult female European red mites were counted.

There are two special uses of `negbinomial`

for handling count data.
Firstly,
when used by `rrvglm`

this
results in a continuum of models in between and
inclusive of quasi-Poisson and negative binomial regression.
This is known as a reduced-rank negative binomial model *(RR-NB)*.
It fits a negative binomial log-linear regression with variance function
\(Var(Y)=\mu+\delta_1 \mu^{\delta_2}\)
where \(\delta_1\)
and \(\delta_2\)
are parameters to be estimated by MLE.
Confidence intervals are available for \(\delta_2\),
therefore it can be decided upon whether the
data are quasi-Poisson or negative binomial, if any.

Secondly,
the use of `negbinomial`

with `parallel = TRUE`

inside `vglm`

can result in a model similar to `quasipoissonff`

.
This is named the *NB-1* model.
The dispersion parameter is estimated by MLE whereas
`glm`

uses the method of moments.
In particular, it fits a negative binomial log-linear regression
with variance function
\(Var(Y) = \phi_0 \mu\)
where \(\phi_0\)
is a parameter to be estimated by MLE.
Confidence intervals are available for \(\phi_0\).

##### Warning

Poisson regression corresponds to \(k\) equalling
infinity. If the data is Poisson or close to Poisson,
numerical problems may occur.
Some corrective measures are taken, e.g.,
\(k\) is effectively capped (relative to the mean) during estimation
to some large value and a warning is issued.
And setting `stepsize = 0.5`

for half stepping is
probably a good idea too when the data is extreme.

The NBD is a strictly unimodal
distribution. Any data set that does not exhibit a mode (somewhere
in the middle) makes the estimation problem difficult.
Set `trace = TRUE`

to monitor convergence.

These functions are fragile; the maximum likelihood
estimate of the index parameter is fraught (see Lawless,
1987). In general, the `quasipoissonff`

is
more robust. Other alternatives to `negbinomial`

are
to fit a NB-1 or RR-NB (aka NB-P) model; see Yee (2014).
Also available are the NB-C, NB-H and NB-G.
Assigning values to the `isize`

argument may lead
to a local solution, and smaller values are preferred
over large values when using this argument.

If one wants to force SFS
to be used on all observations, then
set `max.support = 0`

or `max.chunk.MB = 0`

.
If one wants to force the exact method
to be used for all observations, then
set `max.support = Inf`

.
If the computer has *much* memory, then trying
`max.chunk.MB = Inf`

and
`max.support = Inf`

may provide a small speed increase.
If SFS is used at all, then the `@weights`

slot of the
fitted object will be a matrix;
otherwise that slot will be a `0 x 0`

matrix.

Yet to do: write a family function which uses the methods of moments estimator for \(k\).

##### References

Lawless, J. F. (1987)
Negative binomial and mixed Poisson regression.
*The Canadian Journal of Statistics*
**15**, 209--225.

Hilbe, J. M. (2011)
*Negative Binomial Regression*,
2nd Edition.
Cambridge: Cambridge University Press.

Bliss, C. and Fisher, R. A. (1953)
Fitting the negative binomial distribution to biological data.
*Biometrics*
**9**, 174--200.

Yee, T. W. (2014)
Reduced-rank vector generalized linear models with two linear predictors.
*Computational Statistics and Data Analysis*,
**71**, 889--902.

##### See Also

`quasipoissonff`

,
`poissonff`

,
`zinegbinomial`

,
`negbinomial.size`

(e.g., NB-G),
`nbcanlink`

(NB-C),
`posnegbinomial`

,
`inv.binomial`

,
`rnbinom`

,
`nbolf`

,
`rrvglm`

,
`cao`

,
`cqo`

,
`CommonVGAMffArguments`

,
`simulate.vlm`

,
`ppoints`

,
`qnbinom`

.

##### Examples

```
# NOT RUN {
# Example 1: apple tree data (Bliss and Fisher, 1953)
appletree <- data.frame(y = 0:7, w = c(70, 38, 17, 10, 9, 3, 2, 1))
fit <- vglm(y ~ 1, negbinomial(deviance = TRUE), data = appletree,
weights = w, crit = "coef") # Obtain the deviance
fit <- vglm(y ~ 1, negbinomial(deviance = TRUE), data = appletree,
weights = w, half.step = FALSE) # Alternative method
summary(fit)
coef(fit, matrix = TRUE)
Coef(fit) # For intercept-only models
deviance(fit) # NB2 only; needs 'crit = "coef"' & 'deviance = TRUE' above
# Example 2: simulated data with multiple responses
# }
# NOT RUN {
ndata <- data.frame(x2 = runif(nn <- 200))
ndata <- transform(ndata, y1 = rnbinom(nn, mu = exp(3+x2), size = exp(1)),
y2 = rnbinom(nn, mu = exp(2-x2), size = exp(0)))
fit1 <- vglm(cbind(y1, y2) ~ x2, negbinomial, data = ndata, trace = TRUE)
coef(fit1, matrix = TRUE)
# }
# NOT RUN {
# Example 3: large counts implies SFS is used
# }
# NOT RUN {
ndata <- transform(ndata, y3 = rnbinom(nn, mu = exp(10+x2), size = exp(1)))
with(ndata, range(y3)) # Large counts
fit2 <- vglm(y3 ~ x2, negbinomial, data = ndata, trace = TRUE)
coef(fit2, matrix = TRUE)
head(fit2@weights) # Non-empty; SFS was used
# }
# NOT RUN {
# Example 4: a NB-1 to estimate a negative binomial with Var(Y) = phi0 * mu
nn <- 200 # Number of observations
phi0 <- 10 # Specify this; should be greater than unity
delta0 <- 1 / (phi0 - 1)
mydata <- data.frame(x2 = runif(nn), x3 = runif(nn))
mydata <- transform(mydata, mu = exp(2 + 3 * x2 + 0 * x3))
mydata <- transform(mydata, y3 = rnbinom(nn, mu = mu, size = delta0 * mu))
# }
# NOT RUN {
plot(y3 ~ x2, data = mydata, pch = "+", col = "blue",
main = paste("Var(Y) = ", phi0, " * mu", sep = ""), las = 1)
# }
# NOT RUN {
nb1 <- vglm(y3 ~ x2 + x3, negbinomial(parallel = TRUE, zero = NULL),
data = mydata, trace = TRUE)
# Extracting out some quantities:
cnb1 <- coef(nb1, matrix = TRUE)
mydiff <- (cnb1["(Intercept)", "loge(size)"] -
cnb1["(Intercept)", "loge(mu)"])
delta0.hat <- exp(mydiff)
(phi.hat <- 1 + 1 / delta0.hat) # MLE of phi
summary(nb1)
# Obtain a 95 percent confidence interval for phi0:
myvec <- rbind(-1, 1, 0, 0)
(se.mydiff <- sqrt(t(myvec) %*% vcov(nb1) %*% myvec))
ci.mydiff <- mydiff + c(-1.96, 1.96) * se.mydiff
ci.delta0 <- ci.exp.mydiff <- exp(ci.mydiff)
(ci.phi0 <- 1 + 1 / rev(ci.delta0)) # The 95 percent conf. interval for phi0
Confint.nb1(nb1) # Quick way to get it
summary(glm(y3 ~ x2 + x3, quasipoisson, mydata))$disper # cf. moment estimator
# }
```

*Documentation reproduced from package VGAM, version 1.0-4, License: GPL-3*