aodml

0th

Percentile

ML Estimation of Generalized Linear Models for Overdispersed Count Data

The function fits a beta-binomial (BB) or a negative binomial (NB) generalized linear model from clustered data. For the BB model, data have the form {$(n_1, m_1), (n_2, m_2), ..., (n_N, m_N)$}, where $n_i$ is the size of cluster $i$, $m_i$ the number of successes, and $N$ the number of clusters. The response is the proportion $y = m/n.$ For the NB model, data can be of two types. When modeling simple counts, data have the form {$m_1, m_2, ..., m_N$}, where $m_i$ is the number of occurences of the event under study. When modeling rates (e.g. hazard rates), data have the same form as for the BB model, where $n_i$ is the denominator of the rate for cluster $i$ (considered as an offset, i.e. a constant known value) and $m_i$ the number of occurences of the event. For both types of data, the response is the count $y = m$.

Keywords
regression
Usage
aodml(formula,
  data,
  family = c("bb", "nb"),
  link = c("logit", "cloglog", "probit"), 
	phi.formula = ~ 1,
  phi.scale = c("identity", "log", "inverse"),
  phi.start = NULL,
  fixpar = list(),
  hessian = TRUE,
  method = c("BFGS", "Nelder-Mead", "CG", "SANN"),
  control = list(maxit = 3000, trace = 0), ...)
  
  ## S3 method for class 'aodml':
print(x, ...)
  ## S3 method for class 'aodml':
summary(object, ...)
  ## S3 method for class 'aodml':
anova(object, ...)
  ## S3 method for class 'anova.aodml':
print(x, digits, ...)
  ## S3 method for class 'aodml':
fitted(object, ..., what = c("mu", "nu", "eta", "phi"))
  ## S3 method for class 'aodml':
residuals(object, ..., type = c("deviance", "pearson", "response"))
  ## S3 method for class 'aodml':
coef(object, ...)
  ## S3 method for class 'aodml':
df.residual(object, ...)
  ## S3 method for class 'aodml':
logLik(object, ...)
  ## S3 method for class 'aodml':
deviance(object, ...)
  ## S3 method for class 'aodml':
AIC(object, ..., k = 2)
  ## S3 method for class 'aodml':
vcov(object, ...)
  ## S3 method for class 'aodml':
predict(object, ..., type = c("link", "response"), se.fit = FALSE, newdata = NULL)
Arguments
formula
A formula for the mean $\mu$, defining the parameter vector $b$ (see details). For the BB model, the left-hand side of the formula must be of the form cbind(m, n - m) ~ ... where the fitted proportion is m/n. For t
data
A data frame containing the response (m and, optionnally, n) and the explanatory variable(s).
family
Define the model which is fitted: bb for the BB model and nb for the NB model.
link
For the BB model only. Define the link function $g$ for the mean $\mu$: cloglog, logit (default) or probit. For the NB model, link is automatically set to log.
phi.formula
A right-hand side formula to model optional heterogeneity for the over-dispersion parameter $\Phi$ (see details). Only one single factor is allowed. Default to formula(~ 1) (i.e. no heterogeneity).
phi.scale
Scale on which $\Phi$ is estimated (see details): identity (default), log or inverse.
phi.start
Optional starting values for $\Phi$. Default to 0.1.
fixpar
An optional list of 2 vectors of same length ($>=1$) to set some parameters as constant in the model. The first vector indicates which parameters are set as constant (i.e., not optimized) in the global parameter vector $(b, \Phi)$. The second vect
hessian
A logical (default to TRUE). If FALSE, the hessian and the variances-covariances matrices of the parameters are not calculated.
method
Define the method used for the optimization (see optim).
control
A list to control the optimization parameters. See optim. By default, the maximum number of iterations is set to 3000, and trace is set to 0 to avoid spurious warnings.
object
An object of class aodml
x
An object of class aodml
digits
Number of digits to print in print.summary.aodml and print.anova.aodml. Default to max(3, getOption("digits") - 3)
...
Further arguments passed to optim (e.g. argument method if using function aodml), or further objects of class aodml (function anova.aodml), or further argum
what
For function fitted, a character string indicating the type of fitted values to be returned: legal values are mu for the fitted response; nu for the fitted linear predictor without offset (link scale);
type
For function residuals, a character string indicating the type of residuals to be computed; legal values are deviance for the deviance's residuals, pearson for the Pearson's residuals, and response
k
Numeric scalar for the penalty parameter used to compute the information criterion. The default value ($k = 2$) is the regular AIC = -2 * logLik + 2 * $p$, where $p$ is the number of model coefficients. NB: for AICc, $k$ is set to 2, and AICc = AIC + 2 *
se.fit
Logical scalar indicating whether standard errors should be computed for the predicted values. Default to FALSE.
newdata
A data.frame containing the explanatory variables - and possibly the offset - for the values of which predictions are to be made.
Details

Beta-binomial model (BB) For a given cluster $(n, m)$, the model is $$m | \lambda,n ~ Binomial(n, \lambda)$$ where $\lambda$ follows a Beta distribution $Beta(a_1, a_2)$. Noting $B$ the beta function, the marginal (beta-binomial) distribution of $m$ is $$P(m | n) = C(n, m) * B(a_1 + m, a_2 + n - m) / B(a_1, a_2)$$ Using the re-parameterization $\mu = a_1 / (a_1 + a_2)$ and $\Phi = 1 / (a_1 + a_2 + 1)$, the marginal mean and variance are $$E[m] = n * \mu$$ $$Var[m] = n * \mu * (1 - \mu) * (1 + (n - 1) * \Phi)$$ The response in aodml is $y = m/n$. The mean is $E[y] = \mu$, defined such as $\mu = g^{-1}(X * b) = g^{-1}(\nu)$, where $g$ is the link function, $X$ is a design-matrix, $b$ a vector of fixed effects and $\nu = X * b$ is the corresponding linear predictor. The variance is $Var[y] = (1 / n) * \mu * (1 - \mu) * (1 + (n - 1) * \Phi)$. Negative binomial model (NB) ------ Simple counts (model with no offset) For a given cluster $(m)$, the model is $$y | \lambda ~ Poisson(\lambda)$$ where $\lambda$ follows a Gamma distribution of mean $\mu$ and shape $k$ ($Var[\lambda] = \mu^2 / k$). Noting $G$ the gamma function, the marginal (negative binomial) distribution of $m$ is $$P(m) = {G(m+k) / (m! * G(k))} * (k / (k + \mu))^k * (\mu / (k + \mu))^m$$ Using the re-parameterization $\Phi = 1 / k$, the marginal mean and variance are $$E[m] = \mu$$ $$Var[m] = \mu + \Phi * \mu^2$$ The response in aodml is $y = m$. The mean is $E[y] = \mu$, defined such as $\mu = exp(X * b) = exp(\nu)$. The variance is $Var[y] = \mu + \Phi * \mu^2$. ------ Rates (model with offset) For a given cluster $(n, m)$, the model is $$m | \lambda,n ~ Poisson(\lambda)$$ The marginal (negative binomial) distribution $P(m|n)$ is the same as for the case with no offset ($= P(m)$). The response in aodml is $y = m$. The mean is $E[y] = \mu$, defined such as $\mu = exp(X * b + log(n)) = exp(\nu + log(n)) = exp(\eta)$, where $log(n)$ is the offset. The variance is $Var[y] = \mu + \Phi * \mu^2$. Other details Argument phi.scale of function aodml enables to estimate the over-dispersion parameter under different scales. If phi.scale = "identity" (Default), the function estimates $\Phi$. If phi.scale = "log", the function estimates $log(\Phi)$. If phi.scale = "inverse", the function estimates $1/\Phi$. The full parameter vector returned by aodml, say param, is equal to $(b, \Phi)$. This vector is estimated by maximizing the log-likelihood of the marginal model using function optim. The estimated variances-covariances matrix of param is calculated by the inverse of the observed hessian matrix returned by optim, and is referred to as varparam.

Value

  • An object of class aodml, printed and summarized by various functions. Function deviance.aodml returns the value -2 * (logL - logL_max). The deviance used in function AIC.aodml to calculate AIC and AICc is -2 * logL.

encoding

latin1

References

Crowder, M.J., 1978. Beta-binomial anova for proportions. Appl. Statist. 27, 34-37. Griffiths, D.A., 1973. Maximum likelihood estimation for the beta-binomial distribution and an application to the household distribution of the total number of cases of disease. Biometrics 29, 637-648. Lawless, J.F., 1987. Negative binomial and mixed Poisson regression. The Canadian Journal of Statistics, 15(3): 209-225. McCullagh, P., Nelder, J. A., 1989, 2nd ed. Generalized linear models. New York, USA: Chapman and Hall. Prentice, R.L., 1986. Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. J.A.S.A. 81, 321-327. Williams, D.A., 1975. The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. Biometrics 31, 949-952.

See Also

glm and optim

Aliases
  • aodml
  • print.aodml
  • summary.aodml
  • print.summary.aodml
  • anova.aodml
  • print.anova.aodml
  • fitted.aodml
  • residuals.aodml
  • coef.aodml
  • logLik.aodml
  • deviance.aodml
  • df.residual.aodml
  • AIC.aodml
  • vcov.aodml
  • predict.aodml
Examples
#------ Beta-binomial model

data(orob2)
fm1 <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb")

# summaries
fm1
summary(fm1)
coef(fm1)
vcov(fm1)
logLik(fm1)
deviance(fm1)
AIC(fm1)
gof(fm1)

# predictions
cbind(
  fitted(fm1),
  fitted(fm1, what = "nu"),
  fitted(fm1, what = "eta"),
  fitted(fm1, what = "phi")
)
predict(fm1, type = "response", se.fit = TRUE)
newdat <- data.frame(seed = c("O73", "O75"))
predict(fm1, type = "response", se.fit = TRUE, newdata = newdat)

# model with heterogeneity in phi
fm <- aodml(cbind(m, n - m) ~ seed, data = orob2,
  family = "bb", phi.formula = ~ seed)
summary(fm)
AIC(fm1, fm)

# various phi scales
fm <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb")
fm$phi
fm$phi.scale
fm <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb",
            phi.scale = "log")
fm$phi
fm$phi.scale
fm <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb",
            phi.scale = "inverse")
fm$phi
fm$phi.scale

### Models with coefficient(s) set as constant

# model with 1 phi coefficient, set as constant "0.02"
fm <- aodml(formula = cbind(m, n - m) ~ seed * root, data = orob2,
  family = "bb", fixpar = list(5, 0.02))
fm$param
fm$varparam

# model with 2 phi coefficients, with the first set as constant ~ "0"
fm <- aodml(formula = cbind(m, n - m) ~ seed * root, data = orob2,
  family = "bb", phi.formula = ~ seed, fixpar = list(5, 1e-15))
fm$param
fm$varparam

# model with 2 phi coefficients, with the first set as constant ~ "0",
# and the mu intercept (1rst coef of vector b) set as as constant "-0.5"
fm <- aodml(formula = cbind(m, n - m) ~ seed * root, data = orob2,
            family = "bb", phi.formula = ~ seed,
            fixpar = list(c(1, 5), c(-0.5, 1e-15)))
fm$param
fm$varparam
  
### Model tests

# LR tests - non-constant phi
fm0 <- aodml(cbind(m, n - m) ~ 1, data = orob2, family = "bb")
fm2 <- aodml(cbind(m, n - m) ~ seed + root, data = orob2, family = "bb")
fm3 <- aodml(cbind(m, n - m) ~ seed * root, data = orob2, family = "bb")
anova(fm0, fm1, fm2, fm3)

# LR tests - constant phi
# phi is assumed to be estimated from fm3
fm2.bis <- aodml(cbind(m, n - m) ~ seed  + root, data = orob2,
                 family = "bb", fixpar = list(4, fm3$phi))
LRstat <- 2 * (logLik(fm3) - logLik(fm2.bis))  
pchisq(LRstat, df = 1, lower.tail = FALSE)  
  
# Wald test of the seed factor in fm1
wald.test(b = coef(fm3), varb = vcov(fm3), Terms = 4)

#------ Negative binomial model

### Modelling counts

data(salmonella)
fm1 <- aodml(m ~ log(dose + 10) + dose, data = salmonella, family = "nb")
## fm1 <- aodml(m ~ log(dose + 10) + dose, data = salmonella, family = "nb",
##              method = "Nelder-Mead")

# summaries
fm1
summary(fm1)
coef(fm1)
vcov(fm1)
logLik(fm1)
deviance(fm1)
AIC(fm1)
gof(fm1)

# predictions
cbind(
  fitted(fm1),
  fitted(fm1, what = "nu"),
  fitted(fm1, what = "eta"),
  fitted(fm1, what = "phi")
)
predict(fm1, type = "response", se.fit = TRUE)
newdat <- data.frame(dose = c(20, 40))
predict(fm1, type = "response", se.fit = TRUE, newdata = newdat)

# various phi scales
fm <- aodml(m ~ log(dose + 10) + dose, data = salmonella, family = "nb")
fm$phi
fm$phi.scale
fm <- aodml(m ~ log(dose + 10) + dose, data = salmonella,
            family = "nb", phi.scale = "log")
fm$phi
fm$phi.scale
fm <- aodml(m ~ log(dose + 10) + dose, data = salmonella,
            family = "nb", phi.scale = "inverse")
fm$phi
fm$phi.scale
  
# LR and Wald tests of the "log(dose + 10) + dose" factors
fm0 <- aodml(m ~ 1, data = salmonella, family = "nb")
anova(fm0, fm1)
fm0.bis <- aodml(m ~ 1, data = salmonella, family = "nb",
                 fixpar = list(2, fm1$phi))
LRstat <- 2 * (logLik(fm1) - logLik(fm0.bis))  
pchisq(LRstat, df = 2, lower.tail = FALSE)  
wald.test(b = coef(fm1), varb = vcov(fm1), Terms = 2:3)

### Modelling a rate
  
data(dja)
# rate "m / trisk"
fm <- aodml(formula = m ~ group + offset(log(trisk)),
            data = dja, family = "nb")
summary(fm)

fm <- aodml(formula = m ~ group + offset(log(trisk)),
            phi.formula = ~ group, data = dja, family = "nb",
            phi.scale = "log")
summary(fm)
  
# model with 1 phi coefficient, set as constant "0.8"
fm <- aodml(formula = m ~ group + offset(log(trisk)), data = dja,
            family = "nb", phi.formula = ~1, fixpar = list(3, 0.8))
fm$param
fm$varparam

# model with 2 phi coefficients, with the first set as constant ~ "0" in the identity scale
fm <- aodml(formula = m ~ group + offset(log(trisk)), data = dja,
            family = "nb", phi.formula = ~ group, phi.scale = "log",
            fixpar = list(4, -15))
fm$param
fm$varparam

# model with 2 phi coefficients, with the first set as constant "0" in the identity scale,
# and the mu intercept coefficient (1rst coef of vector b) set as as constant "-0.5"
fm <- aodml(formula = m ~ group + offset(log(trisk)), data = dja,
  family = "nb", phi.formula = ~ group, phi.scale = "log",
  fixpar = list(c(1, 4), c(-0.5, -15)))
fm$param
fm$varparam
Documentation reproduced from package aods3, version 0.4-1, License: GPL (>= 2)

Community examples

Looks like there are no examples yet.