surveillance (version 1.12.1)

twinstim: Fit a Two-Component Spatio-Temporal Point Process Model

Description

A twinstim model as described in Meyer et al. (2012) is fitted to marked spatio-temporal point process data. This constitutes a regression approach for conditional intensity function modelling. The implementation is illustrated in Meyer et al. (2016, Section 3), see vignette("twinstim").

Usage

twinstim(endemic, epidemic, siaf, tiaf, qmatrix = data$qmatrix, data,
         subset, t0 = data$stgrid$start[1], T = tail(data$stgrid$stop,1),
         na.action = na.fail, start = NULL, partial = FALSE,
         epilink = "log", control.siaf = list(F = list(), Deriv = list()),
         optim.args = list(), finetune = FALSE,
         model = FALSE, cumCIF = FALSE, cumCIF.pb = interactive(),
         cores = 1, verbose = TRUE)

Arguments

endemic
right-hand side formula for the exponential (Cox-like multiplicative) endemic component. May contain offsets (to be marked by the special function offset). If omitted or ~0 there will be no endemic component in the m
epidemic
formula representing the epidemic model for the event-specific covariates (marks) determining infectivity. Offsets are not implemented here. If omitted or ~0 there will be no epidemic component in the model.
siaf
spatial interaction function. Possible specifications are:
  • NULLor missing, corresponding tosiaf.constant(), i.e. spatially homogeneous infectivity independent of the distance from the host
  • a list as ret
tiaf
temporal interaction function. Possible specifications are:
  • NULLor missing, corresponding totiaf.constant(), i.e. time-constant infectivity
  • a list as returned bytiaf
qmatrix
square indicator matrix (0/1 or FALSE/TRUE) for possible transmission between the event types. The matrix will be internally converted to logical. Defaults to the $Q$ matrix specified in data
data
an object of class "epidataCS".
subset
an optional vector evaluating to logical indicating a subset of data$events to keep. Missing values are taken as FALSE. The expression is evaluated in the context of the data$events@data data.frame<
t0, T
events having occurred during (-Inf;t0] are regarded as part of the prehistory $H_0$ of the process. Only events that occurred in the interval (t0; T] are considered in the likelihood. The time point t0 (T) must b
na.action
how to deal with missing values in data$events? Do not use na.pass. Missing values in the spatio-temporal grid data$stgrid are not accepted.
start
a named vector of initial values for (a subset of) the parameters. The names must conform to the conventions of twinstim to be assigned to the correct model terms. For instance, "h.(Intercept)" = endemic intercept,
partial
logical indicating if a partial likelihood similar to the approach by Diggle et al. (2010) should be used (default is FALSE). Note that the partial likelihood implementation is not well tested.
epilink
a character string determining the link function to be used for the epidemic linear predictor of event marks. By default, the log-link is used. The experimental alternative epilink = "identity" (for use by
control.siaf
a list with elements "F" and "Deriv", which are lists of extra arguments passed to the functions siaf$F and siaf$Deriv, respectively. These arguments mainly determine the accuracy of the numerica
optim.args
an argument list passed to optim, or NULL, in which case no optimization will be performed but the necessary functions will be returned in a list (similar to what is returned if
finetune
logical indicating if a second maximisation should be performed with robust Nelder-Mead optim using the resulting parameters from the first maximisation as starting point. This argument is only considered if partial = FALSE<
model
logical indicating if the model environment should be kept with the result, which is required for intensityplots and R0<
cumCIF
logical (default: FALSE) indicating whether to calculate the fitted cumulative ground intensity at event times. This is the residual process, see residuals.twinstim.
cumCIF.pb
logical indicating if a progress bar should be shown during the calculation of cumCIF. Defaults to do so in an interactive Rsession, and will be FALSE if cores != 1.
cores
number of processes to use in parallel operation. By default twinstim runs in single-CPU mode. Currently, only the multicore-type of parallel computing via forking is supported, which is not available on Windows, see
verbose
logical indicating if information should be printed during execution. Defaults to TRUE.

Value

  • Returns an S3 object of class "twinstim", which is a list with the following components:
  • coefficientsvector containing the MLE.
  • loglikvalue of the log-likelihood function at the MLE with a logical attribute "partial" indicating if the partial likelihood was used.
  • countsnumber of log-likelihood and score evaluations during optimization.
  • convergedeither TRUE (if the optimizer converged) or a character string containing a failure message.
  • fisherinfoexpected Fisher information evaluated at the MLE. Only non-NULL for full likelihood inference (partial = FALSE) and if spatial and temporal interaction functions are provided with their derivatives.
  • fisherinfo.observedobserved Fisher information matrix evaluated at the value of the MLE. Obtained as the negative Hessian. Only non-NULL if optim.args$method is not "nlminb" and if it was requested by setting hessian=TRUE in optim.args.
  • fittedfitted values of the conditional intensity function at the events.
  • fittedComponentstwo-column matrix with columns "h" and "e" containing the fitted values of the endemic and epidemic components, respectively. (Note that rowSums(fittedComponents) == fitted.)
  • taufitted cumulative ground intensities at the event times. Only non-NULL if cumCIF = TRUE. This is the residual process of the model, see residuals.twinstim.
  • R0estimated basic reproduction number for each event. This equals the spatio-temporal integral of the epidemic intensity over the observation domain (t0;T] x W for each event.
  • nparsvector describing the lengths of the 5 parameter subvectors: endemic intercept(s) $\beta_0(\kappa)$, endemic coefficients $\beta$, epidemic coefficients $\gamma$, parameters of the siaf kernel, and parameters of the tiaf kernel.
  • qmatrixthe qmatrix associated with the epidemic data as supplied in the model call.
  • bboxthe bounding box of data$W.
  • timeRangethe time range used for fitting: c(t0,T).
  • formulaa list containing the four main parts of the model specification: endemic, epidemic, siaf, and tiaf.
  • control.siafsee the Arguments section above.
  • optim.argsinput optimizer arguments used to determine the MLE.
  • functionsif model=TRUE this is a list with components ll, sc and fi, which are functions evaluating the log-likelihood, the score function and the expected Fisher information for a parameter vector $\theta$. The environment of these function is the model environment, which is thus retained in the workspace if model=TRUE. Otherwise, the functions component is NULL.
  • callthe matched call.
  • runtimethe proc.time-queried time taken to fit the model, i.e., a named numeric vector of length 5 of class "proc_time", with the number of cores set as additional attribute.
  • If model=TRUE, the model evaluation environment is assigned to this list and can thus be queried by calling environment() on the result.

encoding

latin1

Details

The function performs maximum likelihood inference for the additive-multiplicative spatio-temporal intensity model described in Meyer et al. (2012). It uses nlminb as the default optimizer and returns an object of class twinstim. Such objects have print, plot and summary methods. The output of the summary can be processed by the toLatex function. Furthermore, the usual model fit methods such as coef, vcov, logLik, residuals, and update are implemented. A specific add-on is the use of the functions R0 and simulate.

References

Diggle, P. J., Kaimi, I. & Abellana, R. (2010): Partial-likelihood analysis of spatio-temporal point-process data. Biometrics, 66, 347-354.

Martinussen, T. and Scheike, T. H. (2006): Dynamic Regression Models for Survival Data. Springer.

Meyer, S. (2010): Spatio-Temporal Infectious Disease Epidemiology based on Point Processes. Master's Thesis, Ludwig-Maximilians-Universit{ae}t M{ue}nchen. Available as http://epub.ub.uni-muenchen.de/11703/ Meyer, S., Elias, J. and H{oe}hle, M. (2012): A space-time conditional intensity model for invasive meningococcal disease occurrence. Biometrics, 68, 607-616. 10.1111/j.1541-0420.2011.01684.x

Meyer, S. and Held, L. (2014): Power-law models for infectious disease spread. The Annals of Applied Statistics, 8 (3), 1612-1639. 10.1214/14-AOAS743

Meyer, S., Held, L. and H{oe}hle, M. (2016): Spatio-temporal analysis of epidemic phenomena using the Rpackage surveillance. Journal of Statistical Software. In press. Preprint available at http://arxiv.org/abs/1411.0416

Rathbun, S. L. (1996): Asymptotic properties of the maximum likelihood estimator for spatio-temporal point processes. Journal of Statistical Planning and Inference, 51, 55-74.

See Also

vignette("twinstim"). There is a simulate.twinstim method, which simulates the point process based on the fitted twinstim.

A discrete-space alternative is offered by the twinSIR modelling framework.

Examples

Run this code
# Load invasive meningococcal disease data
data("imdepi")


### first, fit a simple endemic-only model

m_noepi <- twinstim(
    endemic = addSeason2formula(~ offset(log(popdensity)) + I(start/365-3.5),
                                S=1, period=365, timevar="start"),
    data = imdepi, subset = !is.na(agegrp)
)

## look at the model summary
summary(m_noepi)

## there is no evidence for a type-dependent endemic intercept (LR test)
m_noepi_type <- update(m_noepi, endemic = ~(1|type) + .)
pchisq(2*c(logLik(m_noepi_type)-logLik(m_noepi)), df=1, lower.tail=FALSE)


### add an epidemic component with just the intercept, i.e.
### assuming uniform dispersal in time and space up to a distance of
### eps.s = 200 km and eps.t = 30 days (see summary(imdepi))

m0 <- update(m_noepi, epidemic=~1, start=c("e.(Intercept)"=-18), model=TRUE)

## summarize the model fit
s <- summary(m0, correlation = TRUE, symbolic.cor = TRUE)
s
# output the table of coefficients as LaTeX code
toLatex(s, digits=2)
# or, to report rate ratios
xtable(s)

## the default confint-method can be used for Wald-CI's
confint(m0, level=0.95)

## same "untrimmed" R0 for every event (simple epidemic intercept model)
summary(R0(m0, trimmed=FALSE))

## plot the path of the fitted total intensity
plot(m0, "total intensity", tgrid=500)

## the remaining examples are more time-consuming
if (surveillance.options("allExamples"))
{
  ## NB: in contrast to using nlminb() optim's BFGS would miss the
  ##     likelihood maximum wrt the epidemic intercept if starting at -10
  m0_BFGS <- update(m_noepi, epidemic=~1, start=c("e.(Intercept)"=-10),
                    optim.args = list(method="BFGS"))
  format(cbind(nlminb=coef(m0), BFGS=coef(m0_BFGS)), digits=3, scientific=FALSE)
  m0_BFGS$fisherinfo   # singular Fisher information matrix here
  m0$fisherinfo
  logLik(m0_BFGS)
  logLik(m0)
  ## nlminb is more powerful since we make use of the analytical fisherinfo
  ## as estimated by the model during optimization, which optim cannot

  ## extract "residual process" integrating over space (takes some seconds)
  res <- residuals(m0)
  # if the model describes the true CIF well _in the temporal dimension_,
  # then this residual process should behave like a stationary Poisson
  # process with intensity 1
  plot(res, type="l"); abline(h=c(0, length(res)), lty=2)
  # -> use the function checkResidualProcess
  checkResidualProcess(m0)

  ## estimate an exponential temporal decay of infectivity
  m1_tiaf <- update(m0, tiaf=tiaf.exponential())
  plot(m1_tiaf, "tiaf", scaled=FALSE)

  ## estimate a step function for spatial interaction
  m1_fstep <- update(m0, siaf=c(5,10,25,50))
  plot(m1_fstep, "siaf", scaled=FALSE)
  rug(getSourceDists(imdepi, dimension="space"), ticksize=0.02)

  ## estimate a continuously decreasing spatial interaction function, where
  ## likelihood evaluation takes much longer than for (piecewise) constant
  ## spatial spread; here we use the kernel of an isotropic bivariate
  ## Gaussian with a standard deviation of exp(2.8) = 16.4 km for both
  ## finetypes (fixed to speed up the example)
  m1 <- update(m0,
      siaf = siaf.gaussian(1, F.adaptive = TRUE),
      start = c("e.siaf.1" = 2.8), # sigma = exp(2.8) = 16.4 km
      optim.args = list(fixed="e.siaf.1"),
      control.siaf = list(F=list(adapt=0.25)) # use adaptive eps=sigma/4
  )
  AIC(m_noepi, m0, m1)   # further improvement
  summary(m1, test.iaf=FALSE)   # nonsense to test H0: log(sigma) = 0
  plot(m1, "siaf", scaled=FALSE)

  ## add epidemic covariates
  m2 <- update(m1, epidemic = ~ 1 + type + agegrp)
  AIC(m1, m2)   # further improvement
  summary(m2)
  
  ## look at estimated R0 values by event type
  tapply(R0(m2), imdepi$events@data[names(R0(m2)), "type"], summary)
}

Run the code above in your browser using DataLab