# aodml

##### ML Estimation of Generalized Linear Models for Overdispersed Count Data

The function fits a beta-binomial (BB) or a negative binomial (NB) generalized linear model from clustered data.

For the BB model, data have the form {\((n_1, m_1), (n_2, m_2), ..., (n_N, m_N)\)}, where \(n_i\) is the size of cluster \(i\), \(m_i\) the number of “successes”, and \(N\) the number of clusters. The response is the proportion \(y = m/n.\)

For the NB model, data can be of two types. When modeling simple counts, data have the form {\(m_1, m_2, ..., m_N\)}, where \(m_i\) is the number of occurences of the event under study. When modeling rates (e.g. hazard rates), data have the same form as for the BB model, where \(n_i\) is the denominator of the rate for cluster \(i\) (considered as an “offset”, i.e. a constant known value) and \(m_i\) the number of occurences of the event. For both types of data, the response is the count \(y = m\).

- Keywords
- regression

##### Usage

```
aodml(formula,
data,
family = c("bb", "nb"),
link = c("logit", "cloglog", "probit"),
phi.formula = ~ 1,
phi.scale = c("identity", "log", "inverse"),
phi.start = NULL,
fixpar = list(),
hessian = TRUE,
method = c("BFGS", "Nelder-Mead", "CG", "SANN"),
control = list(maxit = 3000, trace = 0), ...)
# S3 method for aodml
print(x, ...)
# S3 method for aodml
summary(object, ...)
# S3 method for aodml
anova(object, ...)
# S3 method for anova.aodml
print(x, digits, ...)
# S3 method for aodml
fitted(object, ..., what = c("mu", "nu", "eta", "phi"))
# S3 method for aodml
residuals(object, ..., type = c("deviance", "pearson", "response"))
# S3 method for aodml
coef(object, ...)
# S3 method for aodml
df.residual(object, ...)
# S3 method for aodml
logLik(object, ...)
# S3 method for aodml
deviance(object, ...)
# S3 method for aodml
AIC(object, ..., k = 2)
# S3 method for aodml
vcov(object, ...)
# S3 method for aodml
predict(object, ..., type = c("link", "response"), se.fit = FALSE, newdata = NULL)
```

##### Arguments

- formula
A formula for the mean \(\mu\), defining the parameter vector \(b\) (see details).

For the BB model, the left-hand side of the formula must be of the form

`cbind(m, n - m) ~ ...`

where the fitted proportion is

`m/n`

.For the NB model, the left-hand side of the formula must be of the form

`m ~ ...`

where the fitted count is

`m`

. To fit a rate, argument`offset`

must be used in the right-hand side of the formula (see examples).- data
A data frame containing the response (

`m`

and, optionnally,`n`

) and the explanatory variable(s).- family
Define the model which is fitted: “bb” for the BB model and “nb” for the NB model.

- link
For the BB model only. Define the link function \(g\) for the mean \(\mu\): “cloglog”, “logit” (default) or “probit”. For the NB model,

`link`

is automatically set to “log”.- phi.formula
A right-hand side formula to model optional heterogeneity for the over-dispersion parameter \(\Phi\) (see details). Only one single factor is allowed.

Default to

`formula(~ 1)`

(i.e. no heterogeneity).- phi.scale
Scale on which \(\Phi\) is estimated (see details): “identity” (default), “log” or “inverse”.

- phi.start
Optional starting values for \(\Phi\). Default to 0.1.

- fixpar
An optional list of 2 vectors of same length (\(>=1\)) to set some parameters as constant in the model.

The first vector indicates which parameters are set as constant (i.e., not optimized) in the global parameter vector \((b, \Phi)\).

The second vector indicates the corresponding values. For instance,

`fixpar = list(c(4, 5), c(0, 0.1))`

means that the 4th and 5th components of vector \((b, \Phi)\) are set to 0 and 0.1. Argument

`fixpar`

can be usefull, for instance, to calculate profiled log-likehoods.- hessian
A logical (default to

`TRUE`

). If`FALSE`

, the hessian and the variances-covariances matrices of the parameters are not calculated.- method
Define the method used for the optimization (see

`optim`

).- control
A list to control the optimization parameters. See

`optim`

. By default, the maximum number of iterations is set to 3000, and trace is set to 0 to avoid spurious warnings.- object
An object of class

`aodml`

- x
An object of class

`aodml`

- digits
Number of digits to print in print.summary.aodml and print.anova.aodml. Default to

`max(3, getOption("digits") - 3)`

- …
Further arguments passed to

`optim`

(e.g. argument`method`

if using function`aodml`

), or further objects of class`aodml`

(function`anova.aodml`

), or further arguments passed to`print.aodml`

and`print.anova.aodml`

.- what
For function

`fitted`

, a character string indicating the type of fitted values to be returned: legal values are “mu” for the fitted response; “nu” for the fitted linear predictor without offset (link scale); “eta” for the fitted linear predictor with offset (link scale); “phi” for the fitted overdispersion coefficient.- type
For function

`residuals`

, a character string indicating the type of residuals to be computed; legal values are “deviance” for the deviance's residuals, “pearson” for the Pearson's residuals, and “response” for the response. For function`predict`

, a character string indicating the type of prediction to be computed; legal values are “link” and “response”.- k
Numeric scalar for the penalty parameter used to compute the information criterion. The default value (\(k = 2\)) is the regular AIC = -2 * logLik + 2 * \(p\), where \(p\) is the number of model coefficients. NB: for AICc, \(k\) is set to 2, and AICc = AIC + 2 * \(p\) * \((p + 1)\) / \((n - p - 1)\), with \(n\) the number of observations.

- se.fit
Logical scalar indicating whether standard errors should be computed for the predicted values. Default to

`FALSE`

.- newdata
A

`data.frame`

containing the explanatory variables - and possibly the offset - for the values of which predictions are to be made.

##### Details

**Beta-binomial model (BB)**

For a given cluster \((n, m)\), the model is

$$m | \lambda,n ~ Binomial(n, \lambda)$$

where \(\lambda\) follows a Beta distribution \(Beta(a_1, a_2)\). Noting \(B\) the beta function, the marginal (beta-binomial) distribution of \(m\) is

$$P(m | n) = C(n, m) * B(a_1 + m, a_2 + n - m) / B(a_1, a_2)$$

Using the re-parameterization \(\mu = a_1 / (a_1 + a_2)\) and \(\Phi = 1 / (a_1 + a_2 + 1)\), the marginal mean and variance are

$$E[m] = n * \mu$$

$$Var[m] = n * \mu * (1 - \mu) * (1 + (n - 1) * \Phi)$$

The response in `aodml`

is \(y = m/n\). The mean is \(E[y] = \mu\), defined such as \(\mu = g^{-1}(X * b) = g^{-1}(\nu)\), where \(g\) is the link function, \(X\) is a design-matrix, \(b\) a vector of fixed effects and \(\nu = X * b\) is the corresponding linear predictor. The variance is \(Var[y] = (1 / n) * \mu * (1 - \mu) * (1 + (n - 1) * \Phi)\).

**Negative binomial model (NB)**

*------ Simple counts (model with no offset)*

For a given cluster \((m)\), the model is

$$y | \lambda ~ Poisson(\lambda)$$

where \(\lambda\) follows a Gamma distribution of mean \(\mu\) and shape \(k\) (\(Var[\lambda] = \mu^2 / k\)). Noting \(G\) the gamma function, the marginal (negative binomial) distribution of \(m\) is

$$P(m) = {G(m+k) / (m! * G(k))} * (k / (k + \mu))^k * (\mu / (k + \mu))^m$$

Using the re-parameterization \(\Phi = 1 / k\), the marginal mean and variance are

$$E[m] = \mu$$

$$Var[m] = \mu + \Phi * \mu^2$$

The response in `aodml`

is \(y = m\). The mean is \(E[y] = \mu\), defined such as \(\mu = exp(X * b) = exp(\nu)\). The variance is \(Var[y] = \mu + \Phi * \mu^2\).

*------ Rates (model with offset)*

For a given cluster \((n, m)\), the model is

$$m | \lambda,n ~ Poisson(\lambda)$$

The marginal (negative binomial) distribution \(P(m|n)\) is the same as for the case with no offset (\(= P(m)\)). The response in `aodml`

is \(y = m\). The mean is \(E[y] = \mu\), defined such as \(\mu = exp(X * b + log(n)) = exp(\nu + log(n)) = exp(\eta)\), where \(log(n)\) is the offset. The variance is \(Var[y] = \mu + \Phi * \mu^2\).

**Other details**

Argument `phi.scale`

of function `aodml`

enables to estimate the over-dispersion parameter under different scales.

If `phi.scale = "identity"`

(Default), the function estimates \(\Phi\).

If `phi.scale = "log"`

, the function estimates \(log(\Phi)\).

If `phi.scale = "inverse"`

, the function estimates \(1/\Phi\).

The full parameter vector returned by `aodml`

, say `param`

, is equal to \((b, \Phi)\). This vector is estimated by maximizing the log-likelihood of the marginal model using function `optim`

. The estimated variances-covariances matrix of `param`

is calculated by the inverse of the observed hessian matrix returned by `optim`

, and is referred to as `varparam`

.

##### Value

An object of class `aodml`

, printed and summarized by various functions. Function `deviance.aodml`

returns the value `-2 * (logL - logL_max)`

. The “deviance” used in function `AIC.aodml`

to calculate AIC and AICc is `-2 * logL`

.

##### References

Crowder, M.J., 1978. *Beta-binomial anova for proportions*. Appl. Statist. 27, 34-37.
Griffiths, D.A., 1973. *Maximum likelihood estimation for the beta-binomial distribution and an application
to the household distribution of the total number of cases of disease*. Biometrics 29, 637-648.
Lawless, J.F., 1987. *Negative binomial and mixed Poisson regression*. The Canadian Journal of Statistics, 15(3): 209-225.
McCullagh, P., Nelder, J. A., 1989, 2nd ed. *Generalized linear models*. New York, USA: Chapman and Hall.
Prentice, R.L., 1986. *Binary regression using an extended beta-binomial distribution, with discussion of
correlation induced by covariate measurement errors*. J.A.S.A. 81, 321-327.
Williams, D.A., 1975. *The analysis of binary responses from toxicological experiments involving
reproduction and teratogenicity*. Biometrics 31, 949-952.

##### See Also

##### Examples

```
# NOT RUN {
#------ Beta-binomial model
data(orob2)
fm1 <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb")
# summaries
fm1
summary(fm1)
coef(fm1)
vcov(fm1)
logLik(fm1)
deviance(fm1)
AIC(fm1)
gof(fm1)
# predictions
cbind(
fitted(fm1),
fitted(fm1, what = "nu"),
fitted(fm1, what = "eta"),
fitted(fm1, what = "phi")
)
predict(fm1, type = "response", se.fit = TRUE)
newdat <- data.frame(seed = c("O73", "O75"))
predict(fm1, type = "response", se.fit = TRUE, newdata = newdat)
# model with heterogeneity in phi
fm <- aodml(cbind(m, n - m) ~ seed, data = orob2,
family = "bb", phi.formula = ~ seed)
summary(fm)
AIC(fm1, fm)
# various phi scales
fm <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb")
fm$phi
fm$phi.scale
fm <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb",
phi.scale = "log")
fm$phi
fm$phi.scale
fm <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb",
phi.scale = "inverse")
fm$phi
fm$phi.scale
### Models with coefficient(s) set as constant
# model with 1 phi coefficient, set as constant "0.02"
fm <- aodml(formula = cbind(m, n - m) ~ seed * root, data = orob2,
family = "bb", fixpar = list(5, 0.02))
fm$param
fm$varparam
# model with 2 phi coefficients, with the first set as constant ~ "0"
fm <- aodml(formula = cbind(m, n - m) ~ seed * root, data = orob2,
family = "bb", phi.formula = ~ seed, fixpar = list(5, 1e-15))
fm$param
fm$varparam
# model with 2 phi coefficients, with the first set as constant ~ "0",
# and the mu intercept (1rst coef of vector b) set as as constant "-0.5"
fm <- aodml(formula = cbind(m, n - m) ~ seed * root, data = orob2,
family = "bb", phi.formula = ~ seed,
fixpar = list(c(1, 5), c(-0.5, 1e-15)))
fm$param
fm$varparam
### Model tests
# LR tests - non-constant phi
fm0 <- aodml(cbind(m, n - m) ~ 1, data = orob2, family = "bb")
fm2 <- aodml(cbind(m, n - m) ~ seed + root, data = orob2, family = "bb")
fm3 <- aodml(cbind(m, n - m) ~ seed * root, data = orob2, family = "bb")
anova(fm0, fm1, fm2, fm3)
# LR tests - constant phi
# phi is assumed to be estimated from fm3
fm2.bis <- aodml(cbind(m, n - m) ~ seed + root, data = orob2,
family = "bb", fixpar = list(4, fm3$phi))
LRstat <- 2 * (logLik(fm3) - logLik(fm2.bis))
pchisq(LRstat, df = 1, lower.tail = FALSE)
# Wald test of the seed factor in fm1
wald.test(b = coef(fm3), varb = vcov(fm3), Terms = 4)
#------ Negative binomial model
### Modelling counts
data(salmonella)
fm1 <- aodml(m ~ log(dose + 10) + dose, data = salmonella, family = "nb")
## fm1 <- aodml(m ~ log(dose + 10) + dose, data = salmonella, family = "nb",
## method = "Nelder-Mead")
# summaries
fm1
summary(fm1)
coef(fm1)
vcov(fm1)
logLik(fm1)
deviance(fm1)
AIC(fm1)
gof(fm1)
# predictions
cbind(
fitted(fm1),
fitted(fm1, what = "nu"),
fitted(fm1, what = "eta"),
fitted(fm1, what = "phi")
)
predict(fm1, type = "response", se.fit = TRUE)
newdat <- data.frame(dose = c(20, 40))
predict(fm1, type = "response", se.fit = TRUE, newdata = newdat)
# various phi scales
fm <- aodml(m ~ log(dose + 10) + dose, data = salmonella, family = "nb")
fm$phi
fm$phi.scale
fm <- aodml(m ~ log(dose + 10) + dose, data = salmonella,
family = "nb", phi.scale = "log")
fm$phi
fm$phi.scale
fm <- aodml(m ~ log(dose + 10) + dose, data = salmonella,
family = "nb", phi.scale = "inverse")
fm$phi
fm$phi.scale
# LR and Wald tests of the "log(dose + 10) + dose" factors
fm0 <- aodml(m ~ 1, data = salmonella, family = "nb")
anova(fm0, fm1)
fm0.bis <- aodml(m ~ 1, data = salmonella, family = "nb",
fixpar = list(2, fm1$phi))
LRstat <- 2 * (logLik(fm1) - logLik(fm0.bis))
pchisq(LRstat, df = 2, lower.tail = FALSE)
wald.test(b = coef(fm1), varb = vcov(fm1), Terms = 2:3)
### Modelling a rate
data(dja)
# rate "m / trisk"
fm <- aodml(formula = m ~ group + offset(log(trisk)),
data = dja, family = "nb")
summary(fm)
fm <- aodml(formula = m ~ group + offset(log(trisk)),
phi.formula = ~ group, data = dja, family = "nb",
phi.scale = "log")
summary(fm)
# model with 1 phi coefficient, set as constant "0.8"
fm <- aodml(formula = m ~ group + offset(log(trisk)), data = dja,
family = "nb", phi.formula = ~1, fixpar = list(3, 0.8))
fm$param
fm$varparam
# model with 2 phi coefficients, with the first set as constant ~ "0" in the identity scale
fm <- aodml(formula = m ~ group + offset(log(trisk)), data = dja,
family = "nb", phi.formula = ~ group, phi.scale = "log",
fixpar = list(4, -15))
fm$param
fm$varparam
# model with 2 phi coefficients, with the first set as constant "0" in the identity scale,
# and the mu intercept coefficient (1rst coef of vector b) set as as constant "-0.5"
fm <- aodml(formula = m ~ group + offset(log(trisk)), data = dja,
family = "nb", phi.formula = ~ group, phi.scale = "log",
fixpar = list(c(1, 4), c(-0.5, -15)))
fm$param
fm$varparam
# }
```

*Documentation reproduced from package aods3, version 0.4-1, License: GPL (>= 2)*