hhh4_validation: Predictive Model Assessment for `hhh4` Models

Description

The function oneStepAhead computes successive one-step-ahead predictions for a (random effects) HHH model fitted by hhh4.

The function scores computes a number of (strictly) proper scoring rules based on such one-step-ahead predictions.

See Paul and Held (2011) for further details.

Usage

oneStepAhead(result, tp, type = c("rolling", "first", "final"),
             which.start = c("current", "final"),
             keep.estimates = FALSE, verbose = TRUE, cores = 1)
scores(object, which = c("logs", "rps", "dss", "ses"), units = NULL,
       sign = FALSE, individual = FALSE)

Arguments

result

fitted hhh4 model (class "hhh4").

numeric vector of length 1 or 2. One-step-ahead predictions are computed from time points tp[1], ..., tp[2] (yielding predictions for time points tp[1]+1, ...), where tp[2] defaults to t

type

The default "rolling" procedure sequentially refits the model up to each time point in tp and computes the one-step-ahead predictions for the respective next time point. The alternative types are no true

which.start

Which initial values should be used when successively refitting the model for subsets of the data (up to time point tp[1], up to tp[1]+1, ...) if type="rolling"? Default ("current") is to use

keep.estimates

logical indicating if parameter estimates and log-likelihoods from the successive fits should be returned.

verbose

non-negative integer (usually in the range 0:3) specifying the amount of tracing information to output. During hhh4 model updates, the following verbosity is used: 0 if cores > 1, otherwise <

cores

the number of cores to use when computing the predictions for the set of time points tp in parallel (with mclapply). Note that parallelization is not possible in the defaul

object

result of oneStepAhead.

which

character vector determining which scores to compute. The package surveillance implements the following proper scoring rules: logarithmic score ("logs"), ranked probability score ("rps"), Dawid-Sebastiani s

units

integer or character vector indexing the units for which the scores should be computed. By default (NULL) all units are considered.

sign

logical indicating if the function should also return sign(x-mu), i.e., the sign of the difference between the observed counts and corresponding predictions. This does not really make sense when averaging over multiple u

individual

logical indicating if the individual scores of the units should be returned. By default (FALSE), the individual scores are averaged over all units.

Value

oneStepAhead returns a list with the following components:
predone-step-ahead predictions in a matrix, where each row corresponds to one of the time points requested via the argument tp, and which has ncol(result$stsObj) unit-specific columns. The rownames indicate the predicted time points and the column names are identical to colnames(result$stsObj).
observedmatrix with observed counts at the predicted time points. It has the same dimensions and names as pred.
psiin case of a negative-binomial model, a matrix of the estimated overdispersion parameter(s) at each time point on the internal -log-scale (1 column if "NegBin1", ncol(observed) columns if "NegBinM"). For a "Poisson" model, this component is NULL.
allConvergedlogical indicating if all successive fits converged.
If keep.estimates=TRUE, there are the following additional elements:
coefficientsmatrix of estimated regression parameters from the successive fits.
Sigma.origmatrix of estimated variance parameters from the successive fits.
logliksmatrix with columns "loglikelihood" and "margll" with their obvious meanings.
The function scores computes the scoring rules specified in the argument which. If multiple units are selected and individual=TRUE, the result is an array of dimensions c(nrow(pred),length(units),5+sign) (up to surveillance 1.8-0, the first two dimensions were collapsed to give a matrix). Otherwise, the result is a matrix with nrow(pred) rows and 5+sign columns. If there is only one predicted time point, the first dimension is dropped in both cases. CAVE: The order of the rows (time points) is reversed!

encoding

latin1

References

Czado, C., Gneiting, T. and Held, L. (2009) Predictive model assessment for count data. Biometrics, 65, 1254--1261

Paul, M. and Held, L. (2011) Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts. Statistics in Medicine, 30, 1118--1136

Examples

Run this code

### univariate salmonella agona data

data("salmonella.agona")
## convert to sts class
salmonella <- disProg2sts(salmonella.agona)

## generate formula for temporal and seasonal trends
f.end <- addSeason2formula(f = ~1 + t, S=1, period=52)
model1 <- list(ar = list(f=~1), end = list(f=f.end), family = "NegBin1")
## fit the model
result <- hhh4(salmonella, model1)

## do sequential one-step-ahead predictions for the last 5 weeks
pred <- oneStepAhead(result, nrow(salmonella)-5, type="rolling",
                     which.start="final", verbose=FALSE)

## test equivalence of parallelized version
if (.Platform$OS.type == "unix" && isTRUE(parallel::detectCores() > 1))
    stopifnot(identical(pred,
        oneStepAhead(result, nrow(salmonella)-5, type="rolling",
                     which.start="final", verbose=FALSE, cores=2)))

## oneStepAhead(..., type="final") just means fitted values
stopifnot(identical(
    unname(oneStepAhead(result, nrow(salmonella)-5,
                        type="final", verbose=FALSE)$pred),
    unname(tail(fitted(result), 5))))

## compute scores
(sc <- scores(pred))

## compute mean scores
colMeans(sc)

## plot a (non-randomized) PIT histogram for the predictions
with(pred, pit(x = observed, pdistr = "pnbinom", mu = pred, size = exp(psi)))


######################################################################
# Do one-step-ahead predictions for the models from the Paul & Held 
# (2011) paper for the influenza data from Bavaria and Baden-Wuerttemberg 
# (this takes some time!)
######################################################################

## see ?hhh4 for a specification of the models

## do 1-step ahead predictions for the last two years

tp <- nrow(fluBYBW)-2*52

val_A0 <- oneStepAhead(res_A0,tp=tp) 
val_B0 <- oneStepAhead(res_B0,tp=tp) 
val_C0 <- oneStepAhead(res_C0,tp=tp) 

val_A1 <- oneStepAhead(res_A1,tp=tp) 
val_B1 <- oneStepAhead(res_B1,tp=tp) 
val_C1 <- oneStepAhead(res_C1,tp=tp) 

val_A2 <- oneStepAhead(res_A2,tp=tp) 
val_B2 <- oneStepAhead(res_B2,tp=tp) 
val_C2 <- oneStepAhead(res_C2,tp=tp) 

val_D <- oneStepAhead(res_D,tp=tp) 


##################################
## compute scores
################################

#scores
vals <- ls(pattern="val_")
nam <- substring(vals,first=5,last=6)

whichScores <- c("logs", "rps", "ses")
scores_i <- list()
meanScores <- NULL
for(i in seq(along.with=vals)){
  sc <- scores(get(vals[i]), which=whichScores, individual=TRUE)
  scores_i[[i]] <- sc
  meanScores <- rbind(meanScores,colMeans(sc, dims=2))
}

names(scores_i) <- nam
rownames(meanScores) <- nam

##comparison with best model B2 

compareWithBest <- function(best, whichModels, nPermut=9999, seed=1234){
  set.seed(seed)
  pVals <- NULL
  for(score in seq_along(whichScores)){
    p <- c()
    for(model in whichModels){
      if(model==best) p <- c(p,NA)
      else p <- c(p,permutationTest(scores_i[[model]][,,score],scores_i[[best]][,,score],
     plot=TRUE,nPermutation=nPermut, verbose=TRUE)$pVal.permut)
    }  
    pVals <- cbind(pVals,p)
  }
  return(pVals)
}

pVals_flu <- compareWithBest(best=6, whichModels=1:10,seed=2059710987)
rownames(pVals_flu) <- nam

Run the code above in your browser using DataLab