logbin.smooth: Smooth Log-Binomial Regression

Description

logbin.smooth fits log-link binomial regression models using a stable CEM algorithm. It provides additional flexibility over logbin by allowing for smooth semi-parametric terms.

Usage

logbin.smooth(formula, mono = NULL, data, subset, na.action, offset, 
              control = list(...), model = TRUE, model.logbin = FALSE, 
              method = c("cem", "em"), accelerate = c("em", "squarem", "pem", "qn"),
              control.accelerate = list(), ...)

Arguments

formula

an object of class "formula" (or one that can be coerced into that class): a symbolic description of the model to be fitted. The details of model specification are given under "Details". The model must contain an intercept and at least one semi-parametric term, included by using the B or Iso functions. Note that 2nd-order terms (such as interactions) or above are not currently supported (see logbin).

mono

a vector indicating which terms in formula should be restricted to have a monotonically non-decreasing relationship with the outcome. May be specified as names or indices of the terms.

Iso() terms are always monotonic.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which logbin.smooth is called.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when the data contain NAs. The default is set be the na.action setting of options, and is na.fail if that is unset. The `factory-fresh' default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a non-positive numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See model.offset.

control

a list of parameters for controlling the fitting process, passed to logbin.control.

model

a logical value indicating whether the model frame should be included as a component of the returned value.

model.logbin

a logical value indicating whether the fitted logbin object should be included as a component of the returned value.

method

a character string that determines which EM-type algorithm to use to find the MLE: "cem" for the combinatorial EM algorithm, which cycles through a sequence of constrained parameter spaces, or "em" for a single EM algorithm based on an overparameterised model.

Unlike logbin, methods "glm" and "ab" are not available because they do not support the necessary monotonicity constraints.

accelerate

a character string that determines the acceleration algorithm to be used, (partially) matching one of "em" (no acceleration -- the default), "squarem", "pem" or "qn". See turboem for further details. Note that "decme" is not permitted.

control.accelerate

a list of control parameters for the acceleration algorithm. See turboem for details of the parameters that apply to each algorithm. If not specified, the defaults are used.

…

arguments to be used to form the default control argument if it is not supplied directly.

Value

An object of class "logbin.smooth", which contains the same objects as class "logbin" (the same as "glm"), as well as:

model.logbin

if model.logbin is TRUE; the logbin object for the fully parametric model corresponding to the fitted model.

xminmax.smooth

the minimum and maximum observed values for each of the smooth terms in the model, to help define the covariate space.

full.formula

the component from interpret.logbin.smooth(formula) that contains the formula term with any additional arguments to the B function removed.

knots

a named list containing the knot vectors for each of the smooth terms in the model.

Details

logbin.smooth performs the same fitting process as logbin, providing a stable maximum likelihood estimation procedure for log-link binomial GLMs, with the added flexibility of allowing semi-parametric B and Iso terms (note that logbin.smooth will stop with an error if no semi-parametric terms are specified in the right-hand side of the formula; logbin should be used instead).

The method partitions the parameter space associated with the semi-parametric part of the model into a sequence of constrained parameter spaces, and defines a fully parametric logbin model for each. The model with the highest log-likelihood is the MLE for the semi-parametric model (see Donoghoe and Marschner, 2015).

References

Donoghoe, M. W. and I. C. Marschner (2015). Flexible regression models for rate differences, risk differences and relative risks. International Journal of Biostatistics 11(1): 91--108.

Donoghoe, M. W. and I. C. Marschner (2018). logbin: An R package for relative risk regression using the log-binomial model. Journal of Statistical Software 86(9): 1--22.

Marschner, I. C. (2014). Combinatorial EM algorithms. Statistics and Computing 24(6): 921--940.

Examples

Run this code

# NOT RUN {
## Simple example
x <- c(0.3, 0.2, 0.0, 0.1, 0.2, 0.1, 0.7, 0.2, 1.0, 0.9)
y <- c(5, 4, 6, 4, 7, 3, 6, 5, 9, 8)
system.time(m1 <- logbin.smooth(cbind(y, 10-y) ~ B(x, knot.range = 0:2), mono = 1, trace = 1))
## Compare with accelerated version
system.time(m1.acc <- update(m1, accelerate = "squarem"))
## Isotonic relationship
m2 <- logbin.smooth(cbind(y, 10-y) ~ Iso(x))
# }
# NOT RUN {
plot(m1)
plot(m2)
# }
# NOT RUN {
summary(predict(m1, type = "response"))
summary(predict(m2, type = "response"))
# }

Run the code above in your browser using DataLab