VGAM (version 1.0-4)

# binomialff: Binomial Family Function

## Description

Family function for fitting generalized linear models to binomial responses, where the dispersion parameter may be known or unknown.

## Usage

```binomialff(link = "logit", dispersion = 1, multiple.responses = FALSE,
onedpar = !multiple.responses, parallel = FALSE,
zero = NULL, bred = FALSE, earg.link = FALSE)```

## Arguments

Link function; see `Links` and `CommonVGAMffArguments` for more information.

dispersion

Dispersion parameter. By default, maximum likelihood is used to estimate the model because it is known. However, the user can specify `dispersion = 0` to have it estimated, or else specify a known positive value (or values if `multiple.responses` is `TRUE`).

multiple.responses

Multivariate response? If `TRUE`, then the response is interpreted as \(M\) independent binary responses, where \(M\) is the number of columns of the response matrix. In this case, the response matrix should have \(Q\) columns consisting of counts (successes), and the `weights` argument should have \(Q\) columns consisting of the number of trials (successes plus failures).

If `FALSE` and the response is a (2-column) matrix, then the number of successes is given in the first column, and the second column is the number of failures.

onedpar

One dispersion parameter? If `multiple.responses`, then a separate dispersion parameter will be computed for each response (column), by default. Setting `onedpar = TRUE` will pool them so that there is only one dispersion parameter to be estimated.

parallel

A logical or formula. Used only if `multiple.responses` is `TRUE`. This argument allows for the parallelism assumption whereby the regression coefficients for a variable is constrained to be equal over the \(M\) linear/additive predictors. If `parallel = TRUE` then the constraint is not applied to the intercepts.

zero

An integer-valued vector specifying which linear/additive predictors are modelled as intercepts only. The values must be from the set {1,2,…,\(M\)}, where \(M\) is the number of columns of the matrix response. See `CommonVGAMffArguments` for more information.

Details at `CommonVGAMffArguments`.

bred

Details at `CommonVGAMffArguments`. Setting `bred = TRUE` should work for multiple responses (`multiple.responses = TRUE`) and all VGAM link functions; it has been tested for `logit` only (and it gives similar results to brglm but not identical), and further testing is required. One result from fitting bias reduced binary regression is that finite regression coefficients occur when the data is separable (see example below). Currently `hdeff.vglm` does not work when `bred = TRUE`.

## Value

An object of class `"vglmff"` (see `vglmff-class`). The object is used by modelling functions such as `vglm`, `vgam`, `rrvglm`, `cqo`, and `cao`.

## Warning

With a multivariate response, assigning a known dispersion parameter for each response is not handled well yet. Currently, only a single known dispersion parameter is handled well.

See the above note regarding `bred`.

The maximum likelihood estimate will not exist if the data is completely separable or quasi-completely separable. See Chapter 10 of Altman et al. (2004) for more details, and safeBinaryRegression. Yet to do: add a `sepcheck = TRUE`, say, argument to detect this problem and give an appropriate warning.

## Details

This function is largely to mimic `binomial`, however there are some differences.

If the dispersion parameter is unknown, then the resulting estimate is not fully a maximum likelihood estimate (see pp.124--8 of McCullagh and Nelder, 1989).

A dispersion parameter that is less/greater than unity corresponds to under-/over-dispersion relative to the binomial model. Over-dispersion is more common in practice.

Setting `multiple.responses = TRUE` is necessary when fitting a Quadratic RR-VGLM (see `cqo`) because the response is a matrix of \(M\) columns (e.g., one column per species). Then there will be \(M\) dispersion parameters (one per column of the response matrix).

When used with `cqo` and `cao`, it may be preferable to use the `cloglog` link.

## References

McCullagh, P. and Nelder, J. A. (1989) Generalized Linear Models, 2nd ed. London: Chapman & Hall.

Altman, M. and Gill, J. and McDonald, M. P. (2004) Numerical Issues in Statistical Computing for the Social Scientist, Hoboken, NJ, USA: Wiley-Interscience.

Ridout, M. S. (1990) Non-convergence of Fisher's method of scoring---a simple example. GLIM Newsletter, 20(6).

`quasibinomialff`, `Links`, `rrvglm`, `cqo`, `cao`, `betabinomial`, `posbinomial`, `zibinomial`, `double.expbinomial`, `seq2binomial`, `amlbinomial`, `simplex`, `binomial`, `simulate.vlm`, `hdeff.vglm`, safeBinaryRegression.

## Examples

Run this code
```# NOT RUN {
quasibinomialff()

shunua <- hunua[sort.list(with(hunua, altitude)), ]  # Sort by altitude
fit <- vglm(agaaus ~ poly(altitude, 2), binomialff(link = cloglog),
data = shunua)
# }
# NOT RUN {
plot(agaaus ~ jitter(altitude), shunua, ylab = "P(Agaaus = 1)",
main = "Presence/absence of Agathis australis", col = "blue", las = 1)
with(shunua, lines(altitude, fitted(fit), col = "orange", lwd = 2))
# }
# NOT RUN {

# Fit two species simultaneously
fit2 <- vgam(cbind(agaaus, kniexc) ~ s(altitude),
binomialff(multiple.responses = TRUE), data = shunua)
# }
# NOT RUN {
with(shunua, matplot(altitude, fitted(fit2), type = "l",
main = "Two species response curves", las = 1))
# }
# NOT RUN {

# Shows that Fisher scoring can sometime fail. See Ridout (1990).
ridout <- data.frame(v = c(1000, 100, 10), r = c(4, 3, 3), n = c(5, 5, 5))
(ridout <- transform(ridout, logv = log(v)))
# The iterations oscillates between two local solutions:
glm.fail <- glm(r / n ~ offset(logv) + 1, weight = n,
binomial(link = 'cloglog'), ridout, trace = TRUE)
coef(glm.fail)
# vglm()'s half-stepping ensures the MLE of -5.4007 is obtained:
vglm.ok <- vglm(cbind(r, n-r) ~ offset(logv) + 1,
binomialff(link = cloglog), ridout, trace = TRUE)
coef(vglm.ok)

# Separable data
set.seed(123)
threshold <- 0
bdata <- data.frame(x2 = sort(rnorm(nn <- 100)))
bdata <- transform(bdata, y1 = ifelse(x2 < threshold, 0, 1))
fit <- vglm(y1 ~ x2, binomialff(bred = TRUE),
data = bdata, criter = "coef", trace = TRUE)
coef(fit, matrix = TRUE)  # Finite!!
summary(fit)
# }
# NOT RUN {
plot(depvar(fit) ~ x2, data = bdata, col = "blue", las = 1)
lines(fitted(fit) ~ x2, data = bdata, col = "orange")
abline(v = threshold, col = "gray", lty = "dashed")
# }
```

Run the code above in your browser using DataCamp Workspace