quasibin: Quasi-Likelihood Model for Proportions

Description

The function fits the generalized linear model “II” proposed by Williams (1982) accounting for overdispersion in clustered binomial data $(n, y)$.

Usage

quasibin(formula, data, link = c("logit", "cloglog"), phi = NULL, tol =  0.001)

Arguments

formula

A formula for the fixed effects. The left-hand side of the formula must be of the form cbind(y, n - y) where the modelled probability is y/n.

link

The link function for the mean $p$: “logit” or “cloglog”.

data

A data frame containing the response (n and y) and explanatory variable(s).

phi

When phi is NULL (the default), the overdispersion parameter $\phi$ is estimated from the data. Otherwise, its value is considered as fixed.

tol

A positive scalar (default to 0.001). The algorithm stops at iteration $r + 1$ when the condition $\chi{^2}[r+1] - \chi{^2}[r] <= tol$ is met by the $\chi^2$ statistics .

Value

An object of formal class “glimQL”: see glimQL-class for details.

Details

For a given cluster $(n, y)$, the model is: $$y~|~\lambda \sim Binomial(n,~\lambda)$$ with $\lambda$ a random variable of mean $E[\lambda] = p$ and variance $Var[\lambda] = \phi * p * (1 - p)$. The marginal mean and variance are: $$E[y] = p$$ $$Var[y] = p * (1 - p) * [1 + (n - 1) * \phi]$$ The overdispersion parameter $\phi$ corresponds to the intra-cluster correlation coefficient, which is here restricted to be positive. The function uses the function glm and the parameterization: $p = h(X b) = h(\eta)$, where $h$ is the inverse of a given link function, $X$ is a design-matrix, $b$ is a vector of fixed effects and $\eta = X b$ is the linear predictor. The estimate of $b$ maximizes the quasi log-likelihood of the marginal model. The parameter $\phi$ is estimated with the moment method or can be set to a constant (a regular glim is fitted when $\phi$ is set to zero). The literature recommends to estimate $\phi$ from the saturated model. Several explanatory variables are allowed in $b$. None is allowed in $\phi$.

References

Moore, D.F., 1987, Modelling the extraneous variance in the presence of extra-binomial variation. Appl. Statist. 36, 8-14. Williams, D.A., 1982, Extra-binomial variation in logistic linear models. Appl. Statist. 31, 144-148.

Examples

Run this code

# NOT RUN {
  data(orob2) 
  fm1 <- glm(cbind(y, n - y) ~ seed * root,
             family = binomial, data = orob2)
  fm2 <- quasibin(cbind(y, n - y) ~ seed * root,
                  data = orob2, phi = 0)
  fm3 <- quasibin(cbind(y, n - y) ~ seed * root,
                  data = orob2)
  rbind(fm1 = coef(fm1), fm2 = coef(fm2), fm3 = coef(fm3))
  # show the model
  fm3
  # dispersion parameter and goodness-of-fit statistic
  c(phi = fm3@phi, 
    X2 = sum(residuals(fm3, type = "pearson")^2))
  # model predictions
  predfm1 <- predict(fm1, type = "response", se = TRUE)
  predfm3 <- predict(fm3, type = "response", se = TRUE)
  New <- expand.grid(seed = levels(orob2$seed),
                     root = levels(orob2$root))
  predict(fm3, New, se = TRUE, type = "response")
  data.frame(orob2, p1 = predfm1$fit, 
                    se.p1 = predfm1$se.fit,
                    p3 = predfm3$fit,
                    se.p3 = predfm3$se.fit)
  fm4 <- quasibin(cbind(y, n - y) ~ seed + root,
                  data = orob2, phi = fm3@phi)
  # Pearson's chi-squared goodness-of-fit statistic
  # compare with fm3's X2
  sum(residuals(fm4, type = "pearson")^2)
  
# }

Run the code above in your browser using DataLab