binomialff: Binomial Family Function

Description

Family function for fitting generalized linear models to binomial responses

Usage

binomialff(link = "logitlink", multiple.responses = FALSE,
       parallel = FALSE, zero = NULL, bred = FALSE, earg.link = FALSE)

Value

An object of class "vglmff"

(see vglmff-class). The object is used by modelling functions such as

vglm,

vgam,

rrvglm,

cqo, and cao.

Arguments

link: Link function; see Links and CommonVGAMffArguments for more information.

multiple.responses: Multivariate response? If TRUE, then the response is interpreted as \(M\) independent binary responses, where \(M\) is the number of columns of the response matrix. In this case, the response matrix should have \(Q\) columns consisting of counts (successes), and the weights argument should have \(Q\) columns consisting of the number of trials (successes plus failures).

parallel: A logical or formula. Used only if multiple.responses is TRUE. This argument allows for the parallelism assumption whereby the regression coefficients for a variable is constrained to be equal over the \(M\) linear/additive predictors. If parallel = TRUE then the constraint is not applied to the intercepts.
zero: An integer-valued vector specifying which linear/additive predictors are modelled as intercepts only. The values must be from the set {1,2,...,\(M\)}, where \(M\) is the number of columns of the matrix response. See CommonVGAMffArguments for more information.
earg.link: Details at CommonVGAMffArguments.
bred: Details at CommonVGAMffArguments. Setting bred = TRUE should work for multiple responses (multiple.responses = TRUE) and all VGAM link functions; it has been tested for logitlink only (and it gives similar results to brglm but not identical), and further testing is required. One result from fitting bias reduced binary regression is that finite regression coefficients occur when the data is separable (see example below). Currently hdeff.vglm does not work when bred = TRUE.

Author

Thomas W. Yee

Warning

See the above note regarding bred.

The maximum likelihood estimate will not exist if the data is completely separable or quasi-completely separable. See Chapter 10 of Altman et al. (2004) for more details, and safeBinaryRegression and hdeff.vglm. Yet to do: add a sepcheck = TRUE, say, argument to further detect this problem and give an appropriate warning.

Details

This function is largely to mimic binomial, however there are some differences.

When used with cqo and cao, it may be preferable to use the clogloglink link.

References

McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. London: Chapman & Hall.

Altman, M. and Gill, J. and McDonald, M. P. (2004). Numerical Issues in Statistical Computing for the Social Scientist, Hoboken, NJ, USA: Wiley-Interscience.

Ridout, M. S. (1990). Non-convergence of Fisher's method of scoring---a simple example. GLIM Newsletter, 20(6).

Examples

Run this code

shunua <- hunua[sort.list(with(hunua, altitude)), ]  # Sort by altitude
fit <- vglm(agaaus ~ poly(altitude, 2), binomialff(link = clogloglink),
            data = shunua)
if (FALSE) {
plot(agaaus ~ jitter(altitude), shunua, ylab = "Pr(Agaaus = 1)",
     main = "Presence/absence of Agathis australis", col = 4, las = 1)
with(shunua, lines(altitude, fitted(fit), col = "orange", lwd = 2)) }

# Fit two species simultaneously
fit2 <- vgam(cbind(agaaus, kniexc) ~ s(altitude),
             binomialff(multiple.responses = TRUE), data = shunua)
if (FALSE) {
with(shunua, matplot(altitude, fitted(fit2), type = "l",
     main = "Two species response curves", las = 1)) }

# Shows that Fisher scoring can sometime fail. See Ridout (1990).
ridout <- data.frame(v = c(1000, 100, 10), r = c(4, 3, 3), n = rep(5, 3))
(ridout <- transform(ridout, logv = log(v)))
# The iterations oscillates between two local solutions:
glm.fail <- glm(r / n ~ offset(logv) + 1, weight = n,
               binomial(link = 'cloglog'), ridout, trace = TRUE)
coef(glm.fail)
# vglm()'s half-stepping ensures the MLE of -5.4007 is obtained:
vglm.ok <- vglm(cbind(r, n-r) ~ offset(logv) + 1,
               binomialff(link = clogloglink), ridout, trace = TRUE)
coef(vglm.ok)
wsdm(vglm.ok)

# Separable data
set.seed(123)
threshold <- 0
bdata <- data.frame(x2 = sort(rnorm(nn <- 100)))
bdata <- transform(bdata, y1 = ifelse(x2 < threshold, 0, 1))
fit <- vglm(y1 ~ x2, binomialff(bred = TRUE),
            data = bdata, criter = "coef", trace = TRUE)
coef(fit, matrix = TRUE)  # Finite!!
summary(fit, wsdm = TRUE, hdiff = 0.0001)

if (FALSE)  plot(depvar(fit) ~ x2, data = bdata, col = "blue", las = 1)
lines(fitted(fit) ~ x2, data = bdata, col = "orange")
abline(v = threshold, col = "gray", lty = "dashed")

Run the code above in your browser using DataLab