Learn R Programming

VGAM (version 1.1-14)

betabinomialff: Beta-binomial Distribution Family Function

Description

Fits a beta-binomial distribution by maximum likelihood estimation. The two parameters here are the shape parameters of the underlying beta distribution.

Usage

betabinomialff(lshape1 = "loglink", lshape2 = "loglink",
   ishape1 = 1, ishape2 = NULL, imethod = 1, ishrinkage = 0.95,
   nsimEIM = NULL, zero = NULL)

Arguments

Value

An object of class "vglmff"

(see vglmff-class). The object is used by modelling functions such as vglm.

Suppose fit is a fitted beta-binomial model. Then

fit@y (better: depvar(fit)) contains the sample proportions \(y\), fitted(fit) returns estimates of

\(E(Y)\), and weights(fit, type = "prior") returns the number of trials \(N\).

Details

There are several parameterizations of the beta-binomial distribution. This family function directly models the two shape parameters of the associated beta distribution rather than the probability of success (however, see Note below). The model can be written \(T|P=p \sim Binomial(N,p)\) where \(P\) has a beta distribution with shape parameters \(\alpha\) and \(\beta\). Here, \(N\) is the number of trials (e.g., litter size), \(T=NY\) is the number of successes, and \(p\) is the probability of a success (e.g., a malformation). That is, \(Y\) is the proportion of successes. Like binomialff, the fitted values are the estimated probability of success (i.e., \(E[Y]\) and not \(E[T]\)) and the prior weights \(N\) are attached separately on the object in a slot.

The probability function is $$P(T=t) = {N \choose t} \frac{B(\alpha+t, \beta+N-t)} {B(\alpha, \beta)}$$ where \(t=0,1,\ldots,N\), and \(B\) is the beta function with shape parameters \(\alpha\) and \(\beta\). Recall \(Y = T/N\) is the real response being modelled.

The default model is \(\eta_1 = \log(\alpha)\) and \(\eta_2 = \log(\beta)\) because both parameters are positive. The mean (of \(Y\)) is \(p=\mu=\alpha/(\alpha+\beta)\) and the variance (of \(Y\)) is \(\mu(1-\mu)(1+(N-1)\rho)/N\). Here, the correlation \(\rho\) is given by \(1/(1 + \alpha + \beta)\) and is the correlation between the \(N\) individuals within a litter. A litter effect is typically reflected by a positive value of \(\rho\). It is known as the over-dispersion parameter.

This family function uses Fisher scoring. The two diagonal elements of the second-order expected derivatives with respect to \(\alpha\) and \(\beta\) are computed numerically, which may fail for large \(\alpha\), \(\beta\), \(N\) or else take a long time.

References

Moore, D. F. and Tsiatis, A. (1991). Robust estimation of the variance in moment methods for extra-binomial and extra-Poisson variation. Biometrics, 47, 383--401.

Prentice, R. L. (1986). Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. Journal of the American Statistical Association, 81, 321--327.

See Also

extbetabinomial, betabinomial, Betabinom, binomialff, betaff, dirmultinomial, lirat, simulate.vlm.

Examples

Run this code
# Example 1
N <- 10; s1 <- exp(1); s2 <- exp(2)
y <- rbetabinom.ab(n = 100, size = N, shape1 = s1, shape2 = s2)
fit <- vglm(cbind(y, N-y) ~ 1, betabinomialff, trace = TRUE)
coef(fit, matrix = TRUE)
Coef(fit)
head(fit@misc$rho)  # The correlation parameter
head(cbind(depvar(fit), weights(fit, type = "prior")))


# Example 2
fit <- vglm(cbind(R, N-R) ~ 1, betabinomialff, data = lirat,
            trace = TRUE, subset = N > 1)
coef(fit, matrix = TRUE)
Coef(fit)
fit@misc$rho  # The correlation parameter
t(fitted(fit))
t(depvar(fit))
t(weights(fit, type = "prior"))
# A "loglink" link for the 2 shape params is a logistic regression:
all.equal(c(fitted(fit)),
          as.vector(logitlink(predict(fit)[, 1] -
                          predict(fit)[, 2], inverse = TRUE)))


# Example 3, which is more complicated
lirat <- transform(lirat, fgrp = factor(grp))
summary(lirat)  # Only 5 litters in group 3
fit2 <- vglm(cbind(R, N-R) ~ fgrp + hb, betabinomialff(zero = 2),
           data = lirat, trace = TRUE, subset = N > 1)
coef(fit2, matrix = TRUE)
coef(fit2, matrix = TRUE)[, 1] -
coef(fit2, matrix = TRUE)[, 2]  # logitlink(p)
if (FALSE)  with(lirat, plot(hb[N > 1], fit2@misc$rho,
   xlab = "Hemoglobin", ylab = "Estimated rho",
   pch = as.character(grp[N > 1]), col = grp[N > 1])) 
if (FALSE)   # cf. Figure 3 of Moore and Tsiatis (1991)
with(lirat, plot(hb, R / N, pch = as.character(grp), col = grp,
   xlab = "Hemoglobin level", ylab = "Proportion Dead", las = 1,
   main = "Fitted values (lines)"))

smalldf <- with(lirat, lirat[N > 1, ])
for (gp in 1:4) {
  xx <- with(smalldf, hb[grp == gp])
  yy <- with(smalldf, fitted(fit2)[grp == gp])
  ooo <- order(xx)
  lines(xx[ooo], yy[ooo], col = gp, lwd = 2)
} 

Run the code above in your browser using DataLab