twinSIR: Fit an Additive-Multiplicative Intensity Model for SIR Data

Description

twinSIR is used to fit additive-multiplicative intensity models for epidemics as described in H�hle{Hoehle} (2009). Estimation is driven by (penalized) maximum likelihood in the point process frame work. Optimization (maximization) of the (penalized) likelihood function is performed by means of optim. The implementation is illustrated in Meyer et al. (2016, Section 4), see vignette("twinSIR").

Usage

twinSIR(formula, data, weights, subset,
        knots = NULL, nIntervals = 1, lambda.smooth = 0, penalty = 1,
        optim.args = list(), model = TRUE, keep.data = FALSE)

Arguments

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the intensity model to be estimated. The details of model specification are given under Details.

data

an object inheriting from class "epidata".

weights

an optional vector of weights to be used in the fitting process. Should be NULL (the default, i.e. all observations have unit weight) or a numeric vector.

subset

an optional vector specifying a subset of observations to be used in the fitting process. The subset atRiskY == 1 is automatically chosen, because the likelihood only depends on those observations.

knots

numeric vector or NULL (the default). Specification of the knots, where we suppose a step of the log-baseline. With the current implementation, these must be existing "stop" time points in subset(data, atRis

nIntervals

the number of intervals of constant log-baseline hazard. Defaults to 1, which means an overall constant log-baseline hazard will be fitted.

lambda.smooth

numeric, the smoothing parameter $\lambda$. By default it is 0 which leads to unpenalized likelihood inference. In case lambda.smooth=-1, the automatic smoothing parameter selection based on a mixed model approach is used (cf.

penalty

either a single number denoting the order of the difference used to penalize the log-baseline coefficients (defaults to 1), or a more specific penalty matrix $K$ for the parameter sub-vector $\beta$. In case of non-equidistant knots -- u

optim.args

a list with arguments passed to the optim function. Especially useful are the following ones: [object Object],[object Object],[object Object],[object Object],[object Object]

model

logical indicating if the model frame, the weights, lambda.smooth, the penalty matrix $K$ and the list of used distance functions f (from attributes(data)) should be returned for further comp

keep.data

logical indicating if the "epidata" object (data) should be part of the return value. This is only necessary for use of the simulate-method for "twi

Value

twinSIR returns an object of class "twinSIR". An object of this class is a list containing the following components:
coefficientsa named vector of coefficients.
loglikthe maximum of the (penalized) log-likelihood function.
countsthe number of log-likelihood and score function evaluations.
convergedlogical indicating convergence of the optimization algorithm.
fisherinfo.observedif requested, the negative Hessian from optim.
fisherinfoan estimation of the Expected Fisher Information matrix.
methodthe optimization algorithm used.
intervalsa numeric vector (c(minTime, knots, maxTime)) representing the consecutive intervals of constant log-baseline.
nEventsa numeric vector containing the number of infections in each of the above intervals.
modelif requested, the model information used. This is a list with components "survs" (data.frame with the id, start, stop and event columns), "X" (matrix of the epidemic variables), "Z" (matrix of the endemic variables), "weights" (the specified weights), "lambda.smooth" (the specified lambda.smooth), "K" (the penalty matrix used), and "f" and "w" (the functions to generate the used epidemic covariates). Be aware that the model only contains those rows with atRiskY == 1!
dataif requested, the supplied "epidata" data.
callthe matched call.
formulathe specified formula.
termsthe terms object used.

encoding

latin1

Details

A model is specified through the formula, which has the form

~ epidemicTerm1 + epidemicTerm2 + cox(endemicVar1) *
    cox(endemicVar2)

, i.e. the right hand side has the usual form as in lm with some variables marked as being endemic by the special function cox. The left hand side of the formula is empty and will be set internally to cbind(start, stop, event), which is similar to Surv(start, stop, event, type="counting"). Basically, the additive-multiplicative model for the infection intensity $\lambda_i(t)$ for individual $i$ is $$\lambda_i(t) = Y_i(t) * (e_i(t) + h_i(t))$$ where [object Object],[object Object],[object Object] If a big number of knots (or nIntervals) is chosen, the corresponding log-baseline parameters can be rendered identifiable by the use of penalized likelihood inference. At present, it is the job of the user to choose an adequate value of the smoothing parameter lambda.smooth. Alternatively, a data driven lambda.smooth smoothing parameter selection based on a mixed model representation of an equivalent truncated power spline is offered (see reference for further details). The following two steps are iterated until convergence:

Given fixed smoothing parameter, the penalized likelihood is optimized for the regression components using a L-BFGS-B approach
Given fixed regression parameters, a Laplace approximation of the marginal likelihood for the smoothing parameter is numerically optimized.

Depending on the data convergence might take a couple of iterations.

Note also that it is unwise to include endemic covariates with huge values, as they affect the intensities on the exponential scale after having been multiplied by the parameter vector $\beta$. With big covariates the optim method "L-BFGS-B" will likely terminate due to an infinite log-likelihood or score function in some iteration.

References

H�hle{Hoehle}, M. (2009), Additive-Multiplicative Regression Models for Spatio-Temporal Epidemics, Biometrical Journal, 51(6):961-978.

Meyer, S., Held, L. and H�{oe}hle, M. (2016): Spatio-temporal analysis of epidemic phenomena using the Rpackage surveillance. Journal of Statistical Software. In press. Preprint available at http://arxiv.org/abs/1411.0416

Examples

Run this code

data("fooepidata")
summary(fooepidata)

# fit an overall constant baseline hazard rate
fit1 <- twinSIR(~ B1 + B2 + cox(z2), data = fooepidata)
fit1
summary(fit1)

# fit1 is what is used as data("foofit") in other examples
data("foofit")
stopifnot(all.equal(fit1, foofit))

# fit a piecewise constant baseline hazard rate with 3 intervals using 
# _un_penalized ML and estimated coefs from fit1 as starting values 
fit2 <- twinSIR(~ B1 + B2 + cox(z2), data = fooepidata, nIntervals = 3,
    optim.args = list(par=c(coef(fit1)[1:2],rep(coef(fit1)[3],3),coef(fit1)[4])))
fit2
summary(fit2)

# fit a piecewise constant baseline hazard rate with 9 intervals
# using _penalized_ ML and estimated coefs from fit1 as starting values 
fit3 <- twinSIR(~ B1 + B2 + cox(z2), data = fooepidata, nIntervals = 9,
    lambda.smooth = 0.1, penalty = 1, optim.args = list(
        par=c(coef(fit1)[1:2], rep(coef(fit1)[3],9), coef(fit1)[4])))
fit3
summary(fit3)
# plot of the 9 log-baseline levels
plot(x=fit3$intervals, y=coef(fit3)[c(3,3:11)], type="S")

### -> for more sophisticated intensity plots, see 'plot.twinSIR'
plot(fit3)

Run the code above in your browser using DataLab