logbin
fits relative risk (log-link) binomial
regression models.
logbin(formula, mono = NULL, data, subset, na.action, start = NULL,
offset, control = list(...), model = TRUE,
method = c("cem", "em", "glm", "glm2", "ab"),
accelerate = c("em", "squarem", "pem", "qn"),
control.method = list(), warn = TRUE, ...)
an object of class "formula"
(or one that can be coerced into that class): a symbolic
description of the model to be fitted. The details of
model specification are given under "Details". Note that
the model must contain an intercept, and 2nd-order terms
(such as interactions) or above are currently not supported
by the "cem"
and "em"
methods --- see "Note".
a vector indicating which terms in
formula
should be restricted to have a
monotonically non-decreasing relationship with the
outcome. May be specified as names or indices of the
terms.
method = "glm"
and "glm2"
cannot impose
monotonicity constraints, and they are not currently
supported for method = "ab"
.
an optional data frame, list or environment
(or object coercible by as.data.frame
to a
data frame) containing the variables in the model. If not
found in data
, the variables are taken from
environment(formula)
, typically the environment
from which logbin
is called.
an optional vector specifying a subset of observations to be used in the fitting process.
a function which indicates what should happen when the data
contain NA
s. The default is set be the na.action
setting of options
, and is na.fail
if that is unset. The `factory-fresh' default is na.omit
.
Another possible value is NULL
, no action. Value
na.exclude
can be useful.
starting values for the parameters in the linear predictor.
this can be used to specify an a
priori known component to be included in the linear
predictor during fitting. This should be NULL
or a
non-positive numeric vector of length equal to the number of cases.
One or more offset
terms can be included in
the formula instead or as well, and if more than one is
specified their sum is used. See
model.offset
.
a list of parameters for controlling the
fitting process, passed to
logbin.control
.
With method = "cem"
, epsilon
should be smaller than bound.tol
.
a logical value indicating whether the model frame should be included as a component of the returned value.
a character string that determines which algorithm to use
to find the MLE. The main purpose of logbin
is the
implementation of stable EM-type algorithms: "cem"
for
the combinatorial EM algorithm, which cycles through a sequence
of constrained parameter spaces, or "em"
for a single EM
algorithm based on an overparameterised model.
"ab"
implements an adaptive barrier method, using the
constrOptim
function.
"glm"
or "glm2"
may be used
to compare the results from the usual IWLS algorithms on the same model.
for the "cem"
and "em"
methods, a character string that determines the acceleration
algorithm to be used, (partially) matching one of "em"
(no acceleration --- the default),
"squarem"
, "pem"
or "qn"
. See turboem
for further details. Note that "decme"
is not permitted.
a list of control parameters for the fitting algorithm.
This is passed to the control.method
argument of turboem
if method = "cem"
or "em"
.
If method = "ab"
, this is passed to the control
argument of
constrOptim
(and hence to optim
--- see this documentation
for full details). Note that the trace
and maxit
elements are
ignored and the equivalent items from the supplied logbin.control
argument
are used instead. May also contain element method
(default "BFGS"
),
which is passed to the method
argument of constrOptim
.
If any items are not specified, the defaults are used.
a logical indicating whether or not warnings should be provided for non-convergence or boundary values.
arguments to be used to form the default
control
argument if it is not supplied directly.
logbin
returns an object of class "logbin"
,
which inherits from classes "glm"
and "lm"
.
The function summary.logbin
can be used
to obtain or print a summary of the results.
The generic accessor functions coefficients
,
fitted.values
and residuals
can be used to
extract various useful features of the value returned by
logbin
. Note that effects
will not work.
An object of class "logbin"
is a list containing the
same components as an object of class "glm"
(see the
"Value" section of glm
). It also includes:
the maximised log-likelihood.
a small-sample corrected
version of Akaike's An Information Criterion
(Hurvich, Simonoff and Tsai, 1998). This is used by
logbin.smooth
to choose the optimal number of
knots for smooth terms.
the minimum and maximum observed values for each of the continuous covariates, to help define the covariate space of the model.
As well as:
estimated coefficients associated with the non-positive parameterisation corresponding to the MLE.
non-negative model matrix associated with
np.coefficients
.
(if control$coeftrace = TRUE
), a matrix or
list of matrices containing the coefficient estimates after each
EM iteration.
logbin
fits a generalised linear model (GLM) with a
binomial error distribution and log link
function. Predictors are assumed to be continuous, unless
they are of class factor
, or are character or
logical (in which case they are converted to
factor
s). Specifying a predictor as monotonic using
the mono
argument means that for continuous terms,
the associated coefficient will be restricted to be
non-negative, and for categorical terms, the coefficients
will be non-decreasing in the order of the factor
levels
. This allows semi-parametric monotonic regression
functions, in the form of unsmoothed step-functions. For
smooth regression functions see logbin.smooth
.
As well as allowing monotonicity constraints, the function
is useful when a standard GLM routine, such as
glm
, fails to converge with a log-link
binomial model. For convenience in comparing convergence on
the same model, logbin
can be used
as a wrapper function to glm
and glm2
through the method
argument.
If glm
does achieve successful convergence,
and logbin
converges to an interior point, then the two
results will be identical. However, as illustrated in one of
the examples below, glm
may still experience convergence
problems even when logbin
converges to an interior point.
Note that if logbin
converges to a boundary point, then it
may differ slightly from glm
even if glm
successfully
converges, because of differences in the definition of the parameter
space. logbin
produces valid fitted values for covariate
values within the Cartesian product of the observed range of covariate
values, whereas glm
produces valid fitted values just
for the observed covariate combinations (assuming it successfully
converges). This issue is only relevant when logbin
converges to a boundary point. The adaptive barrier approach defines
the parameter space in the same way as glm
, so the
same comments apply when comparing its results to those from
method = "cem"
or "em"
.
The main computational method is an EM-type algorithm which accommodates
the parameter contraints in the model and is more stable than iteratively
reweighted least squares. This is done in one of two ways,
depending on the choice of the method
argument.
method = "cem"
implements a CEM algorithm (Marschner, 2014),
in which a collection of restricted parameter spaces is defined
that covers the full parameter space, and an EM algorithm is applied within each
restricted parameter space in order to find a collection of
restricted maxima of the log-likelihood function, from
which can be obtained the global maximum over the full
parameter space. See Marschner and Gillett (2012) for further
details.
method = "em"
implements a single EM algorithm
on an overparameterised model, and the MLE of this model
is transformed back to the original parameter space.
Acceleration of the EM algorithm in either case can be
achieved through the methods of the turboem
package, specified through the accelerate
argument. However,
note that these methods do not have the guaranteed convergence of
the standard EM algorithm, particularly when the MLE is on the
boundary of its (possibly constrained) parameter space.
Alternatively, an adaptive barrier method can be used by specifying
method = "ab"
, which maximises the likelihood subject to
constraints on the fitted values.
Hurvich, C. M., J. S. Simonoff and C.-L. Tsai (1998). Smoothing parameter selection in non-parametric regression using an improved Akaike information criterion. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 60(2): 271--293.
Donoghoe, M. W. and I. C. Marschner (2018). logbin: An R package for relative risk regression using the log-binomial model. Journal of Statistical Software 86(9): 1--22.
Marschner, I. C. (2014). Combinatorial EM algorithms. Statistics and Computing 24(6): 921--940.
Marschner, I. C. and A. C. Gillett (2012). Relative risk regression: reliable and flexible methods for log-binomial models. Biostatistics 13(1): 179--192.
logbin.smooth
for semi-parametric models
turboem
for acceleration methods
constrOptim
for the adaptive barrier approach.
# NOT RUN {
require(glm2)
data(heart)
# }
# NOT RUN {
#======================================================
# Model with periodic non-convergence when glm is used
#======================================================
start.p <- sum(heart$Deaths) / sum(heart$Patients)
fit.glm <- glm(cbind(Deaths, Patients-Deaths) ~ factor(AgeGroup) + factor(Severity) +
factor(Delay) + factor(Region), family = binomial(log),
start = c(log(start.p), -rep(1e-4, 8)), data = heart,
trace = TRUE, maxit = 100)
fit.logbin <- logbin(formula(fit.glm), data = heart, trace = 1)
summary(fit.logbin)
# Speed up convergence by using single EM algorithm
fit.logbin.em <- update(fit.logbin, method = "em")
# Speed up convergence by using acceleration methods
fit.logbin.acc <- update(fit.logbin, accelerate = "squarem")
fit.logbin.em.acc <- update(fit.logbin.em, accelerate = "squarem")
# }
# NOT RUN {
#=============================
# Model with interaction term
#=============================
heart$AgeSev <- 10 * heart$AgeGroup + heart$Severity
fit.logbin.int <- logbin(cbind(Deaths, Patients-Deaths) ~ factor(AgeSev) +
factor(Delay) + factor(Region), data = heart, trace = 1, maxit = 100000)
summary(fit.logbin.int)
vcov(fit.logbin.int)
confint(fit.logbin.int)
summary(predict(fit.logbin.int, type = "response"))
# }
# NOT RUN {
anova(fit.logbin, fit.logbin.int, test = "Chisq")
# }
Run the code above in your browser using DataLab