VGAM (version 1.0-4)

# betabinomial: Beta-binomial Distribution Family Function

## Description

Fits a beta-binomial distribution by maximum likelihood estimation. The two parameters here are the mean and correlation coefficient.

## Usage

betabinomial(lmu = "logit", lrho = "logit", irho = NULL, imethod = 1,
ishrinkage = 0.95, nsimEIM = NULL, zero = "rho")

## Arguments

lmu, lrho

Link functions applied to the two parameters. See Links for more choices. The defaults ensure the parameters remain in $$(0,1)$$, however, see the warning below.

irho

Optional initial value for the correlation parameter. If given, it must be in $$(0,1)$$, and is recyled to the necessary length. Assign this argument a value if a convergence failure occurs. Having irho = NULL means an initial value is obtained internally, though this can give unsatisfactory results.

imethod

An integer with value 1 or 2 or …, which specifies the initialization method for $$\mu$$. If failure to converge occurs try the another value and/or else specify a value for irho.

zero

Specifyies which linear/additive predictor is to be modelled as an intercept only. If assigned, the single value can be either 1 or 2. The default is to have a single correlation parameter. To model both parameters as functions of the covariates assign zero = NULL. See CommonVGAMffArguments for more information.

ishrinkage, nsimEIM

See CommonVGAMffArguments for more information. The argument ishrinkage is used only if imethod = 2. Using the argument nsimEIM may offer large advantages for large values of $$N$$ and/or large data sets.

## Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm.

Suppose fit is a fitted beta-binomial model. Then fit@y contains the sample proportions $$y$$, fitted(fit) returns estimates of $$E(Y)$$, and weights(fit, type="prior") returns the number of trials $$N$$.

## Warning

If the estimated rho parameter is close to 0 then it pays to try lrho = "rhobit". One day this may become the default link function.

This family function is prone to numerical difficulties due to the expected information matrices not being positive-definite or ill-conditioned over some regions of the parameter space. If problems occur try setting irho to some numerical value, nsimEIM = 100, say, or else use etastart argument of vglm, etc.

## Details

There are several parameterizations of the beta-binomial distribution. This family function directly models the mean and correlation parameter, i.e., the probability of success. The model can be written $$T|P=p \sim Binomial(N,p)$$ where $$P$$ has a beta distribution with shape parameters $$\alpha$$ and $$\beta$$. Here, $$N$$ is the number of trials (e.g., litter size), $$T=NY$$ is the number of successes, and $$p$$ is the probability of a success (e.g., a malformation). That is, $$Y$$ is the proportion of successes. Like binomialff, the fitted values are the estimated probability of success (i.e., $$E[Y]$$ and not $$E[T]$$) and the prior weights $$N$$ are attached separately on the object in a slot.

The probability function is $$P(T=t) = {N \choose t} \frac{B(\alpha+t, \beta+N-t)} {B(\alpha, \beta)}$$ where $$t=0,1,\ldots,N$$, and $$B$$ is the beta function with shape parameters $$\alpha$$ and $$\beta$$. Recall $$Y = T/N$$ is the real response being modelled.

The default model is $$\eta_1 = logit(\mu)$$ and $$\eta_2 = logit(\rho)$$ because both parameters lie between 0 and 1. The mean (of $$Y$$) is $$p = \mu = \alpha / (\alpha + \beta)$$ and the variance (of $$Y$$) is $$\mu(1-\mu)(1+(N-1)\rho)/N$$. Here, the correlation $$\rho$$ is given by $$1/(1 + \alpha + \beta)$$ and is the correlation between the $$N$$ individuals within a litter. A litter effect is typically reflected by a positive value of $$\rho$$. It is known as the over-dispersion parameter.

This family function uses Fisher scoring. Elements of the second-order expected derivatives with respect to $$\alpha$$ and $$\beta$$ are computed numerically, which may fail for large $$\alpha$$, $$\beta$$, $$N$$ or else take a long time.

## References

Moore, D. F. and Tsiatis, A. (1991) Robust estimation of the variance in moment methods for extra-binomial and extra-Poisson variation. Biometrics, 47, 383--401.

Prentice, R. L. (1986) Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. Journal of the American Statistical Association, 81, 321--327.

betabinomialff, Betabinom, binomialff, betaff, dirmultinomial, lirat, simulate.vlm.

## Examples

Run this code
# NOT RUN {
# Example 1
bdata <- data.frame(N = 10, mu = 0.5, rho = 0.8)
bdata <- transform(bdata,
y = rbetabinom(n = 100, size = N, prob = mu, rho = rho))
fit <- vglm(cbind(y, N-y) ~ 1, betabinomial, data = bdata, trace = TRUE)
coef(fit, matrix = TRUE)
Coef(fit)

# Example 2
fit <- vglm(cbind(R, N-R) ~ 1, betabinomial, lirat,
trace = TRUE, subset = N > 1)
coef(fit, matrix = TRUE)
Coef(fit)
t(fitted(fit))
t(depvar(fit))
t(weights(fit, type = "prior"))

# Example 3, which is more complicated
lirat <- transform(lirat, fgrp = factor(grp))
summary(lirat)  # Only 5 litters in group 3
fit2 <- vglm(cbind(R, N-R) ~ fgrp + hb, betabinomial(zero = 2),
data = lirat, trace = TRUE, subset = N > 1)
coef(fit2, matrix = TRUE)
# }
# NOT RUN {
with(lirat, plot(hb[N > 1], fit2@misc\$rho,
xlab = "Hemoglobin", ylab = "Estimated rho",
pch = as.character(grp[N > 1]), col = grp[N > 1]))
# }
# NOT RUN {
# cf. Figure 3 of Moore and Tsiatis (1991)
with(lirat, plot(hb, R / N, pch = as.character(grp), col = grp, las = 1,
xlab = "Hemoglobin level", ylab = "Proportion Dead",
main = "Fitted values (lines)"))
smalldf <- with(lirat, lirat[N > 1, ])
for (gp in 1:4) {
xx <- with(smalldf, hb[grp == gp])
yy <- with(smalldf, fitted(fit2)[grp == gp])
ooo <- order(xx)
lines(xx[ooo], yy[ooo], col = gp)
}
# }


Run the code above in your browser using DataCamp Workspace