addreg.smooth: Smooth Additive Regression for Discrete Data

Description

addreg.smooth fits additive (identity-link) Poisson, negative binomial and binomial regression models using a stable EM algorithm. It provides additional flexibility over addreg by allowing for semi-parametric terms.

Usage

addreg.smooth(formula, mono = NULL, family, data, standard, subset, 
              na.action, offset, control = list(...), model = TRUE, 
              model.addreg = FALSE, method = c("cem", "em"), 
              accelerate = c("em", "squarem", "pem", "qn"),
              control.method = list(), ...)

Arguments

formula

an object of class "formula" (or one that can be coerced into that class): a symbolic description of the model to be fitted. The details of model specification are given under "Details". The model must contain an intercept and at least one semi-parametric term, included by using the B or Iso functions. Note that 2nd-order terms (such as interactions) or above are not currently supported (see addreg).

mono

a vector indicating which terms in formula should be restricted to have a monotonically non-decreasing relationship with the outcome. May be specified as names or indices of the terms.

Iso() terms are always monotonic.

family

a description of the error distribution to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function (see family for details of family functions), but here it is restricted to be poisson, negbin1 or binomial family with identity link.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which addreg.smooth is called.

standard

a numeric vector of length equal to the number of cases, where each element is a positive constant that (multiplicatively) standardises the fitted value of the corresponding element of the response vector. Ignored for binomial family (the two-column specification of response should be used instead).

subset

an optional vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when the data contain NAs. The default is set be the na.action setting of options, and is na.fail if that is unset. The `factory-fresh' default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a non-negative numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See model.offset.

Ignored for binomial family; not yet implemented for negative binomial models.

control

list of parameters for controlling the fitting process, passed to addreg.control.

model

a logical value indicating whether the model frame (and, for binomial models, the equivalent Poisson model) should be included as a component of the returned value.

model.addreg

a logical value indicating whether the fitted addreg object should be included as a component of the returned value.

method

a character string that determines which EM-type algorithm to use to find the MLE: "cem" for the combinatorial EM algorithm, which cycles through a sequence of constrained parameter spaces, or "em" for a single EM algorithm based on an overparameterised model.

accelerate

a character string that determines the acceleration algorithm to be used, (partially) matching one of "em" (no acceleration --- the default), "squarem", "pem" or "qn". See turboem for further details. Note that "decme" is not permitted.

control.method

a list of control parameters for the acceleration algorithm, which are passed to the control.method argument of turboem.

If any items are not specified, the defaults are used.

…

arguments to be used to form the default control argument if it is not supplied directly.

Value

An object of class "addreg.smooth", which contains the same objects as class "addreg" (the same as "glm" objects, without contrasts, qr, R or effects components), as well as:

model.addreg

if model.addreg is TRUE; the addreg object for the fully parametric model corresponding to the fitted model.

xminmax.smooth

the minimum and maximum observed values for each of the smooth terms in the model, to help define the covariate space.

full.formula

the component from interpret.addreg.smooth(formula) that contains the formula term with any additional arguments to the B function removed.

knots

a named list containing the knot vectors for each of the smooth terms in the model.

Details

addreg.smooth performs the same fitting process as addreg, providing a stable maximum likelihood estimation procedure for identity-link Poisson, negative binomial or binomial models, with the added flexibility of allowing semi-parametric B and Iso terms (note that addreg.smooth will stop with an error if no semi-parametric terms are specified in the right-hand side of the formula; addreg should be used instead).

The method partitions the parameter space associated with the semi-parametric part of the model into a sequence of constrained parameter spaces, and defines a fully parametric addreg model for each. The model with the highest log-likelihood is the MLE for the semi-parametric model (see Donoghoe and Marschner, 2015).

Acceleration of the EM algorithm can be achieved through the methods of the turboEM package, specified through the accelerate argument. However, note that these methods do not have the guaranteed convergence of the standard EM algorithm, particularly when the MLE is on the boundary of its (possibly constrained) parameter space.

References

Donoghoe, M. W. and I. C. Marschner (2015). Flexible regression models for rate differences, risk differences and relative risks. International Journal of Biostatistics 11(1): 91--108.

Marschner, I. C. (2014). Combinatorial EM algorithms. Statistics and Computing 24(6): 921--940.

Examples

Run this code

# NOT RUN {
## Simple example
dat <- data.frame(x1 = c(3.2,3.3,3.4,7.9,3.8,0.7,2.0,5.4,8.4,3.0,1.8,5.6,5.5,9.0,8.2),
  x2 = c(1,0,0,1,0,1,0,0,0,0,1,0,1,1,0),
  n = c(6,7,5,9,10,7,9,6,6,7,7,8,6,8,10),
  y = c(2,1,2,6,3,1,2,2,4,4,1,2,5,7,7))
m1 <- addreg.smooth(cbind(y, n-y) ~ B(x1, knot.range = 1:3) + factor(x2), mono = 1,
  data = dat, family = binomial, trace = 1)
# }
# NOT RUN {
plot(m1, at = data.frame(x2 = 0:1))
points(dat$x1, dat$y / dat$n, col = rainbow(2)[dat$x2 + 1], pch = 20)
# }

Run the code above in your browser using DataLab