hhh4: Fitting HHH Models with Random Effects and Neighbourhood Structure

Description

Fits an autoregressive Poisson or negative binomial model to a univariate or multivariate time series of counts. The characteristic feature of hhh4 models is the additive decomposition of the conditional mean into epidemic and endemic components (Held et al, 2005). Log-linear predictors of covariates and random intercepts are allowed in all components; see the Details below. A general introduction to the hhh4 modelling approach and its implementation is given in the vignette("hhh4"). Meyer et al (2016, Section 5, available as vignette("hhh4_spacetime")) describe hhh4 models for areal time series of infectious disease counts.

Usage

hhh4(stsObj,
     control = list(
         ar = list(f = ~ -1, offset = 1, lag = 1),
         ne = list(f = ~ -1, offset = 1, lag = 1,
                   weights = neighbourhood(stsObj) == 1,
                   scale = NULL, normalize = FALSE),
         end = list(f = ~ 1, offset = 1),
         family = c("Poisson", "NegBin1", "NegBinM"),
         subset = 2:nrow(stsObj),
         optimizer = list(stop = list(tol=1e-5, niter=100),
                          regression = list(method="nlminb"),
                          variance = list(method="nlminb")),
         verbose = FALSE,
         start = list(fixed=NULL, random=NULL, sd.corr=NULL),
         data = list(t = stsObj@epoch - min(stsObj@epoch)),
         keep.terms = FALSE
     ),
     check.analyticals = FALSE)

Arguments

stsObj

object of class "sts" containing the (multivariate) count data time series.

control

a list containing the model specification and control arguments: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

check.analyticals

logical (or a subset of c("numDeriv", "maxLik")), indicating if (how) the implemented analytical score vector and Fisher information matrix should be checked against numerical derivatives at the parameter starting values, usin

Value

hhh4 returns an object of class "hhh4", which is a list containing the following components:
coefficientsnamed vector with estimated (regression) parameters of the model
seestimated standard errors (for regression parameters)
covcovariance matrix (for regression parameters)
Sigmaestimated variance-covariance matrix of random effects
Sigma.origestimated variance parameters on internal scale used for optimization
Sigma.covinverse of marginal Fisher information (on internal scale), i.e., the asymptotic covariance matrix of Sigma.orig
callthe matched call
dimvector with number of fixed and random effects in the model
loglikelihood(penalized) loglikelihood evaluated at the MLE
margll(approximate) log marginal likelihood should the model contain random effects
convergencelogical. Did optimizer converge?
fitted.valuesfitted mean values $\mu_{i,t}$
controlcontrol object of the fit
termsthe terms object used in the fit if keep.terms = TRUE and NULL otherwise
stsObjthe supplied stsObj
lagsnamed integer vector of length two containing the lags used for the epidemic components "ar" and "ne", respectively. The corresponding lag is NA if the component was not included in the model.
nObsnumber of observations used for fitting the model
nTimenumber of time points used for fitting the model
nUnitnumber of units (e.g. areas) used for fitting the model
runtimethe proc.time-queried time taken to fit the model, i.e., a named numeric vector of length 5 of class "proc_time"

encoding

latin1

Details

An endemic-epidemic multivariate time-series model for infectious disease counts $Y_{it}$ from units $i=1,\dots,I$ during periods $t=1,\dots,T$ was proposed by Held et al (2005) and was later extended in a series of papers (Paul et al, 2008; Paul and Held, 2011; Held and Paul, 2012; Meyer and Held, 2014). In its most general formulation, this so-called hhh4 (or HHH or $H^3$ or triple-H) model assumes that, conditional on past observations, $Y_{it}$ has a Poisson or negative binomial distribution with mean $$\mu_{it} = \lambda_{it} y_{i,t-1} + \phi_{it} \sum_{j\neq i} w_{ji} y_{j,t-1} + e_{it} \nu_{it}$$ In the case of a negative binomial model, the conditional variance is $\mu_{it}(1+\psi_i\mu_{it})$ with overdispersion parameters $\psi_i > 0$ (possibly shared across different units, e.g., $\psi_i\equiv\psi$). Univariate time series of counts $Y_t$ are supported as well, in which case hhh4 can be regarded as an extension of glm.nb to account for autoregression. The three unknown quantities of the mean $\mu_{it}$,

$\lambda_{it}$in the autoregressive (ar) component,
$\phi_{it}$in the neighbour-driven (ne) component, and
$\nu_{it}$in the endemic (end) component,

are log-linear predictors incorporating time-/unit-specific covariates. They may also contain unit-specific random intercepts as proposed by Paul and Held (2011). The endemic mean is usually modelled proportional to a unit-specific offset $e_{it}$ (e.g., population numbers or fractions); it is possible to include such multiplicative offsets in the epidemic components as well. The $w_{ji}$ are transmission weights reflecting the flow of infections from unit $j$ to unit $i$. In spatial hhh4 applications, the units refer to geographical regions and the weights could be derived from movement network data. Alternatively, the weights can be estimated parametrically as a function of adjacency order (Meyer and Held, 2014).

(Penalized) Likelihood inference for such hhh4 models has been established by Paul and Held (2011) with extensions for parametric neighbourhood weights by Meyer and Held (2014). Supplied with the analytical score function and Fisher information, the function hhh4 by default uses the quasi-Newton algorithm available through nlminb to maximize the log-likelihood. Convergence is usually fast even for a large number of parameters. If the model contains random effects, the penalized and marginal log-likelihoods are maximized alternately until convergence.

References

Held, L., H�hle{Hoehle}, M., Hofmann, M. (2005): A statistical framework for the analysis of multivariate infectious disease surveillance counts. Statistical Modelling, 5, 187--199. Paul, M., Held, L. and Toschke, A. M. (2008): Multivariate modelling of infectious disease surveillance data. Statistics in Medicine, 27, 6250--6267.

Paul, M. and Held, L. (2011): Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts. Statistics in Medicine, 30, 1118--1136

Held, L. and Paul, M. (2012): Modeling seasonality in space-time infectious disease surveillance data. Biometrical Journal, 54, 824--843 Meyer, S. and Held, L. (2014): Power-law models for infectious disease spread. The Annals of Applied Statistics, 8 (3), 1612-1639. 10.1214/14-AOAS743

Meyer, S., Held, L. and H�{oe}hle, M. (2016): Spatio-temporal analysis of epidemic phenomena using the Rpackage surveillance. Journal of Statistical Software. In press. Preprint available at http://arxiv.org/abs/1411.0416

Examples

Run this code

###########################################################################
## Univariate examples (similar as for the 'hhh4' predecessor 'algo.hhh')
###########################################################################

### weekly counts of salmonella agona cases, UK, 1990-1995

data("salmonella.agona")
## convert old "disProg" to new "sts" data class
salmonella <- disProg2sts(salmonella.agona)
salmonella
plot(salmonella)

## generate formula for an (endemic) time trend and seasonality
f.end <- addSeason2formula(f = ~1 + t, S = 1, period = 52)
f.end
## specify a simple autoregressive negative binomial model
model1 <- list(ar = list(f = ~1), end = list(f = f.end), family = "NegBin1")
## fit this model to the data
res <- hhh4(salmonella, model1)
## summarize the model fit
summary(res, idx2Exp=1, amplitudeShift=TRUE, maxEV=TRUE)
plot(res)
plot(res, type = "season", components = "end")


### weekly counts of meningococcal infections, Germany, 2001-2006

data("influMen")
fluMen <- disProg2sts(influMen)
meningo <- fluMen[, "meningococcus"]
meningo
plot(meningo)

## again a simple autoregressive NegBin model with endemic seasonality
meningoFit <- hhh4(stsObj = meningo, control = list(
    ar = list(f = ~1),
    end = list(f = addSeason2formula(f = ~1, S = 1, period = 52)),
    family = "NegBin1"
))

summary(meningoFit, idx2Exp=TRUE, amplitudeShift=TRUE, maxEV=TRUE)
plot(meningoFit)
plot(meningoFit, type = "season", components = "end")


###########################################################################
## Multivariate examples (similar as for the 'hhh4' predecessor 'algo.hhh')
###########################################################################

### bivariate analysis of influenza and meningococcal infections
### (see Paul et al, 2008)

plot(fluMen, same.scale = FALSE)
     
## Fit a negative binomial model with
## - autoregressive component: disease-specific intercepts
## - neighbour-driven component: only transmission from flu to men
## - endemic component: S=3 and S=1 sine/cosine pairs for flu and men, respectively
## - disease-specific overdispersion

WfluMen <- neighbourhood(fluMen)
WfluMen["meningococcus","influenza"] <- 0
WfluMen
f.end_fluMen <- addSeason2formula(f = ~ -1 + fe(1, which = c(TRUE, TRUE)),
                                  S = c(3, 1), period = 52)
f.end_fluMen
fluMenFit <- hhh4(fluMen, control = list(
    ar = list(f = ~ -1 + fe(1, unitSpecific = TRUE)),
    ne = list(f = ~ 1, weights = WfluMen),
    end = list(f = f.end_fluMen),
    family = "NegBinM"))
summary(fluMenFit, idx2Exp=1:3)
plot(fluMenFit, type = "season", components = "end", unit = 1)
plot(fluMenFit, type = "season", components = "end", unit = 2)


### weekly counts of measles, Weser-Ems region of Lower Saxony, Germany
### (see Held et al, 2005, and vignette("hhh4_spacetime"), where a
### corrected version of these data is used for illustration of 'hhh4')

data("measles.weser")
measles <- disProg2sts(measles.weser)
measles
plot(measles)

## we could fit the same simple model as for the salmonella cases above
measlesFit <- hhh4(measles, model1)
summary(measlesFit, idx2Exp=TRUE, amplitudeShift=TRUE, maxEV=TRUE)

## but we should use region-specific intercepts and a population offset
## in the endemic component => update the model
f.end2 <- update.formula(f.end, ~ . + fe(1, unitSpecific = TRUE) - 1)
f.end2
measlesFit2 <- update(measlesFit,
    end = list(f = f.end2, offset = population(measles)))
summary(measlesFit2, idx2Exp=TRUE, amplitudeShift=TRUE, maxEV=TRUE)

## now also allow for transmission from adjacent districts
measlesFit3 <- update(measlesFit2,
    ne = list(f = ~1, weights = neighbourhood(measles)))
summary(measlesFit3, idx2Exp=TRUE, amplitudeShift=TRUE)
plot(measlesFit3, units = NULL, hide0s = TRUE)

## check that neighbourhood weights array yields same result
.neweights <- array(neighbourhood(measles),
                   dim = c(rep(ncol(measles),2),nrow(measles)),
                   dimnames = c(dimnames(neighbourhood(measles)), list(NULL)))
measlesFit3_tv <- update(measlesFit3, ne = list(weights = .neweights),
                         use.estimates = FALSE)
stopifnot(all.equal(measlesFit3, measlesFit3_tv, ignore = "control"))

## See vignette("hhh4_spacetime") for further analyses of the measles
## data from Weser-Ems, including vaccination coverage as a covariate,
## power-law weights, and random intercepts.


### last but not least, a more sophisticated (and time-consuming)
### analysis of weekly counts of influenza from 140 districts in
### Southern Germany (originally analysed by Paul and Held, 2011,
### and revisited by Held and Paul, 2012, and Meyer and Held, 2014)

data("fluBYBW")
plot(fluBYBW, type = observed ~ time)
plot(fluBYBW, type = observed ~ unit,
     ## mean yearly incidence per 100.000 inhabitants (8 years)
     population = fluBYBW@map$X31_12_01 / 100000 * 8)

## For the full set of models for data("fluBYBW") as analysed by
## Paul and Held (2011), including predictive model assessement
## using proper scoring rules, see the (computer-intensive)
## demo("fluBYBW") script:
demoscript <- system.file(file.path("demo", "fluBYBW.R"),
                          package = "surveillance")
demoscript
#file.show(demoscript)

## Here we fit the improved power-law model of Meyer and Held (2014)
## - autoregressive component: random intercepts + S = 1 sine/cosine pair
## - neighbour-driven component: random intercepts + S = 1 sine/cosine pair
##   + population gravity with normalized power-law weights
## - endemic component: random intercepts + trend + S = 3 sine/cosine pairs
## - random intercepts are iid but correlated between components
f.S1 <- addSeason2formula(
    ~-1 + ri(type="iid", corr="all"),
    S = 1, period = 52)
f.end.S3 <- addSeason2formula(
    ~-1 + ri(type="iid", corr="all") + I((t-208)/100),
    S = 3, period = 52)

## for power-law weights, we need adjaceny orders, which can be
## computed from the binary adjacency indicator matrix
nbOrder1 <- neighbourhood(fluBYBW)
neighbourhood(fluBYBW) <- nbOrder(nbOrder1, 15)

## full model specification
fluModel <- list(
    ar = list(f = f.S1),
    ne = list(f = update.formula(f.S1, ~ . + log(pop)),
              weights = W_powerlaw(maxlag=max(neighbourhood(fluBYBW)),
                                   normalize = TRUE, log = TRUE)),
    end = list(f = f.end.S3, offset = population(fluBYBW)),
    family = "NegBin1", data = list(pop = population(fluBYBW)),
    optimizer = list(variance = list(method = "Nelder-Mead")),
    verbose = TRUE)

## CAVE: random effects considerably increase the runtime of model estimation
## (It is usually advantageous to first fit a model with simple intercepts
## to obtain reasonable start values for the other parameters.)
set.seed(1)  # because random intercepts are initialized randomly
fluFit <- hhh4(fluBYBW, fluModel)

summary(fluFit, idx2Exp=TRUE, amplitudeShift=TRUE)

plot(fluFit, type = "season")

plot(fluFit, type = "neweights", xlab = "adjacency order")

gridExtra::grid.arrange(
    grobs = lapply(c("ar", "ne", "end"), function (comp)
        plot(fluFit, type = "ri", component = comp, main = comp,
             at = seq(-2.6, 2.6, length.out = 15),
             col.regions = cm.colors(14))),
    nrow = 1, ncol = 3)

range(plot(fluFit, type = "maxEV"))

Run the code above in your browser using DataLab