logbin: Log Binomial Regression

Description

logbin fits relative risk (log-link) binomial regression models using a stable combinatorial EM algorithm.

Usage

logbin(formula, mono = NULL, data, subset, na.action, start = NULL,
        offset, control = list(...), model = TRUE, warn = TRUE, ...)

Arguments

formula

an object of class "formula" (or one that can be coerced into that class): a symbolic description of the model to be fitted. The details of model specification are given under 'Details'. Note that

mono

a vector indicating which terms in formula should be restricted to have a monotonically non-decreasing relationship with the outcome. May be specified as names or indices of the terms.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are

subset

an optional vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen with the data contain NAs. The default is set be the na.action setting of options, and is

start

starting values for the parameters in the linear predictor.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a non-positive numeric vector of length equal to the number of cases. One or more

control

a list of parameters for controlling the fitting process, passed to logbin.control.

model

a logical value indicating whether model frame should be included as a component of the returned value.

warn

a logical indicating if warnings should be provided for non-convergence or boundary values.

...

arguments to be used to form the default control argument if it is not supplied directly.

Value

logbin returns an object of class "logbin", which inherits from classes "glm" and "lm". The function summary.logbin can be used to obtain or print a summary of the results. The generic accessor functions coefficients, fitted.values and residuals can be used to extract various useful features of the value returned by logbin. Note that effects will not work. An object of class "logbin" is a list containing the same components as an object of class "glm" (see the "Value" section of glm), but without contrasts, qr, R or effects components. It also includes:
loglikthe maximized log-likelihood.
aic.ca small-sample corrected version of Akaike's An Information Criterion (Hurvich, Simonoff and Tsai (1998)). This is used by logbin.smooth to choose the optimal number of knots for smooth terms.
xminmaxthe minimum and maximum observed values for each of the continuous covariates, to help define the covariate space of the model.
As well as:
np.coefficientsestimated coefficients associated with the non-positive parameterization corresponding to the MLE.
nn.xnon-negative model matrix associated with np.coefficients.

concept

Binomial regression
CEM algorithm

Details

logbin fits a generalized linear model (GLM) with a binomial error distribution and log-link function. Predictors are assumed to be continuous, unless they are of class factor, or are character or logical (in which case they are converted to factors). Specifying a predictor as monotonic using the mono argument means that for continuous terms, the associated coefficient will be restricted to be non-negative, and for categorical terms, the coefficients will be non-decreasing in the order of the factor levels. This allows semi-parametric monotonic regression functions, in the form of unsmoothed step-functions. For smooth regression functions see logbin.smooth. As well as allowing monotonicity constraints, the function is useful when a standard GLM routine, such as glm, fails to converge with a log-link binomial model. If glm does achieve successful convergence, and logbin converges to an interior point, then the two results will be identical. However, as illustrated in one of the examples below, glm may still experience convergence problems even when logbin converges to an interior point. Note that if logbin converges to a boundary point, then it may differ slightly from glm even if glm successfully converges, because of differences in the definition of the parameter space. logbin produces valid fitted values for covariate values within the Cartesian product of the observed range of covariate values, whereas glm produces valid fitted values just for the observed covariate combinations (assuming it successfully converges). This issue is only relevant when logbin converges to a boundary point. The computational method is a combinatorial EM algorithm (Marschner (2014)) which accommodates the parameter contraints in the model and is more stable than iteratively reweighted least squares. A collection of restricted parameter spaces is defined which covers the full parameter space, and the EM algorithm is applied within each restricted parameter space in order to find a collection of restricted maxima of the log-likelihood function, from which can be obtained the global maximum over the full parameter space. See Marschner and Gillett (2012) for further details.

References

Hurvich, C.M., J.S. Simonoff and C.-L. Tsai (1998): "Smoothing parameter selection in non-parametric regression using an improved Akaike information criterion," Journal of the Royal Statistical Society: Series B, 60, 271--293. Marschner, I.C. and A.C. Gillett (2012): "Relative risk regression: reliable and flexible methods for log-binomial models," Biostatistics, 13, 179--192. Marschner, I.C. (2014): "Combinatorial EM algorithms," Statistics and Computing, 24, 921--940.

Examples

Run this code

require(glm2)
data(heart)
#======================================================
#  Model with periodic non-convergence when glm is used
#======================================================

start.p <- sum(heart$Deaths) / sum(heart$Patients)

fit.glm <- glm(cbind(Deaths, Patients-Deaths) ~ factor(AgeGroup) + factor(Severity)
  + factor(Delay) + factor(Region), family = binomial(log), 
  start = c(log(start.p), -rep(1e-4, 8)), data = heart, trace = TRUE, maxit = 100)

fit.logbin <- logbin(formula(fit.glm), data = heart, trace = 1)
## (Note that convergence may be sped up by specifying mono = c(1,2))

summary(fit.logbin)

#=============================
#  Model with interaction term
#=============================

heart$AgeSev <- 10 * heart$AgeGroup + heart$Severity

fit.logbin.int <- logbin(cbind(Deaths, Patients-Deaths) ~ factor(AgeSev)
  + factor(Delay) + factor(Region), data = heart, trace = 1, maxit = 100000)
  
summary(fit.logbin.int)
vcov(fit.logbin.int)
confint(fit.logbin.int)
summary(predict(fit.logbin.int, type = "response"))
anova(fit.logbin, fit.logbin.int, test = "Chisq")

Run the code above in your browser using DataLab