betabin: Beta-Binomial Model for Proportions

Description

Fits a beta-binomial generalized linear model accounting for overdispersion in clustered binomial data $(n, y)$.

Usage

betabin(formula, random, data, link = c("logit", "cloglog"), phi.ini = NULL,
          warnings = FALSE, na.action = na.omit, fixpar = list(),
          hessian = TRUE, control = list(maxit = 2000), …)

Arguments

formula

A formula for the fixed effects b. The left-hand side of the formula must be of the form cbind(y, n - y) where the modelled probability is y/n.

random

A right-hand formula for the overdispersion parameter(s) $\phi$.

link

The link function for the mean $p$: “logit” or “cloglog”.

data

A data frame containing the response (n and y) and explanatory variable(s).

phi.ini

Initial values for the overdispersion parameter(s) $\phi$. Default to 0.1.

warnings

Logical to control the printing of warnings occurring during log-likelihood maximization. Default to FALSE (no printing).

na.action

A function name: which action should be taken in the case of missing value(s).

fixpar

A list with 2 components (scalars or vectors) of the same size, indicating which parameters are fixed (i.e., not optimized) in the global parameter vector $(b, \phi)$ and the corresponding fixed values. For example, fixpar = list(c(4, 5), c(0, 0)) means that 4th and 5th parameters of the model are set to 0.

hessian

A logical. When set to FALSE, the hessian and the variances-covariances matrices of the parameters are not computed.

control

A list to control the optimization parameters. See optim. By default, set the maximum number of iterations to 2000.

…

Further arguments passed to optim.

Value

An object of formal class “glimML”: see glimML-class for details.

Details

For a given cluster $(n, y)$, the model is: $$y~|~\lambda \sim Binomial(n,~\lambda)$$ with $\lambda$ following a Beta distribution $Beta(a1,~a2)$. If $B$ denotes the beta function, then: $$P(\lambda) = \frac{\lambda^{a1~-~1} * (1~-~\lambda)^{a2 - 1}}{B(a1,~a2)}$$ $$E[\lambda] = \frac{a1}{a1 + a2}$$ $$Var[\lambda] = \frac{a1 * a2}{(a1 + a2 + 1) * (a1 + a2)^2}$$ The marginal beta-binomial distribution is: $$P(y) = \frac{C(n,~y) * B(a1 + y, a2 + n - y)}{B(a1,~a2)}$$ The function uses the parameterization $p = \frac{a1}{a1 + a2} = h(X b) = h(\eta)$ and $\phi = \frac{1}{a1 + a2 + 1}$, where $h$ is the inverse of the link function (logit or complementary log-log), $X$ is a design-matrix, $b$ is a vector of fixed effects, $\eta = X b$ is the linear predictor and $\phi$ is the overdispersion parameter (i.e., the intracluster correlation coefficient, which is here restricted to be positive). The marginal mean and variance are: $$E[y] = n * p$$ $$Var[y] = n * p * (1 - p) * [1 + (n - 1) * \phi]$$ The parameters $b$ and $\phi$ are estimated by maximizing the log-likelihood of the marginal model (using the function optim). Several explanatory variables are allowed in $b$, only one in $\phi$.

References

Crowder, M.J., 1978. Beta-binomial anova for proportions. Appl. Statist. 27, 34-37. Griffiths, D.A., 1973. Maximum likelihood estimation for the beta-binomial distribution and an application to the household distribution of the total number of cases of disease. Biometrics 29, 637-648. Prentice, R.L., 1986. Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. J.A.S.A. 81, 321-327. Williams, D.A., 1975. The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. Biometrics 31, 949-952.

Examples

Run this code

# NOT RUN {
  data(orob2)
  fm1 <- betabin(cbind(y, n - y) ~ seed, ~ 1, data = orob2)
  fm2 <- betabin(cbind(y, n - y) ~ seed + root, ~ 1, data = orob2)
  fm3 <- betabin(cbind(y, n - y) ~ seed * root, ~ 1, data = orob2)
  # show the model
  fm1; fm2; fm3
  # AIC
  AIC(fm1, fm2, fm3)
  summary(AIC(fm1, fm2, fm3), which = "AICc")
  # Wald test for root effect
  wald.test(b = coef(fm3), Sigma = vcov(fm3), Terms = 3:4)
  # likelihood ratio test for root effect
  anova(fm1, fm3)
  # model predictions
  New <- expand.grid(seed = levels(orob2$seed),
                     root = levels(orob2$root))
  data.frame(New, predict(fm3, New, se = TRUE, type = "response"))
  # Djallonke sheep data
  data(dja)
  betabin(cbind(y, n - y) ~ group, ~ 1, dja)
  # heterogeneous phi
  betabin(cbind(y, n - y) ~ group, ~ group, dja,
          control = list(maxit = 1000))
  # phi fixed to zero in group TREAT
   betabin(cbind(y, n - y) ~ group, ~ group, dja,
    fixpar = list(4, 0))
  # glim without overdispersion
  summary(glm(cbind(y, n - y) ~ group,
    family = binomial, data = dja))
  # phi fixed to zero in both groups
  betabin(cbind(y, n - y) ~ group, ~ group, dja,
    fixpar = list(c(3, 4), c(0, 0))) 
  
# }

Run the code above in your browser using DataLab