estimate_infections: Estimate Infections, the Time-Varying Reproduction Number and the Rate of Growth

Description

This function uses a non-parametric approach to reconstruct cases by date of infection from reported cases. It can optionally then estimate the time-varying reproduction number and the rate of growth.

Usage

estimate_infections(
  reported_cases,
  family = "negbin",
  generation_time,
  delays,
  gp = list(basis_prop = 0.3, boundary_scale = 2),
  rt_prior = list(mean = 1, sd = 1),
  prior_smoothing_window = 7,
  horizon = 7,
  model,
  cores = 1,
  chains = 4,
  samples = 1000,
  warmup = 200,
  estimate_rt = TRUE,
  estimate_week_eff = TRUE,
  estimate_breakpoints = FALSE,
  burn_in = 0,
  stationary = FALSE,
  fixed = FALSE,
  fixed_future_rt = FALSE,
  adapt_delta = 0.99,
  max_treedepth = 15,
  return_fit = FALSE,
  verbose = TRUE,
  debug = FALSE
)

Arguments

reported_cases

A data frame of confirmed cases (confirm) by date (date). confirm must be integer and date must be in date format.

family

A character string indicating the reporting model to use. Defaults to negative binomial ("negbin") with poisson ("poisson") also supported.

generation_time

A list containing the mean, standard deviation of the mean (mean_sd), standard deviation (sd), standard deviation of the standard deviation and the maximum allowed value for the generation time (assuming a gamma distribution).

delays

A list of delays (i.e incubation period/reporting delay) between infection and report. Each list entry must also be a list containing the mean, standard deviation of the mean (mean_sd), standard deviation (sd), standard deviation of the standard deviation and the maximum allowed value for the that delay (assuming a lognormal distribution with all parameters excepting the max allowed value on the log scale).

List controlling the Gaussian process approximation. Must contain the basis_prop (number of basis functions based on scaling the time points) which defaults to 0.3 and must be between 0 and 1 (increasing this increases the accuracy of the approximation and the cost of additional compute. Must also contain the boundary_scale (multiplied by half the range of the input time series). Increasing this increases the accuracy of the approximation at the cost of additional compute. See here: https://arxiv.org/abs/2004.11408 for more information on setting these parameters. Can optionally also contain the lengthscale_mean and lengthscale_sd. If these are specified this will override the defaults of 0 and 2 (normal distributed truncated at zero).

rt_prior

A list contain the mean and standard deviation (sd) of the gamma distributed prior for Rt. By default this is assumed to be mean 1 with a standard deviation of 1.

prior_smoothing_window

Numeric defaults to 7. The number of days over which to take a rolling average for the prior based on reported cases.

horizon

Numeric, defaults to 7. Number of days into the future to forecast.

model

A compiled stan model. By default uses the internal package model.

cores

Numeric, defaults to 2. The number of cores to use when fitting the stan model.

chains

Numeric, defaults to 2. The number of MCMC chains to use.

samples

Numeric, defaults to 1000. Number of samples post warmup.

warmup

Numeric, defaults to 200. Number of iteration of warmup to use.

estimate_rt

Logical, defaults TRUE. Should Rt be estimated when imputing infections.

estimate_week_eff

Logical, defaults TRUE. Should weekly reporting effects be estimated.

estimate_breakpoints

Logical, defaults to FALSE. Should breakpoints in Rt be estimated. If true then reported_cases must contain a breakpoint variable that is 1 on the dates with breakpoints and otherwise 0. Breakpoints are fit jointly with a global non-parametric effect and so represent a conservative estimate of breakpoint changes.

burn_in

Numeric, defaults to 0. The number of initial estimates to discard. This argument may be used to reduce spurious findings when running estimate_infections on a partial timeseries (as the earliest estimates will not be informed by all cases that occurred only those supplied to estimate_infections). The combined delays used will inform the appropriate length of this burn in but 7 days is likely a sensible starting point.

stationary

Logical, defaults to FALSE. Should Rt be estimated with a global mean. When estimating Rt this should substantially improve run times but will revert to the global average for real time and forecasted estimates. This setting is most appropriate when estimating historic Rt or when combined with breakpoints.

fixed

Logical, defaults to FALSE. If TRUE then a Gaussian process is not used and Rt is assumed to be constant over time (apart from any manually included breakpoints). If estimate_rt is FALSE then this reduces the backcalculation to a simple mean shift. This option can be used to produce a null model estimate, to produce a single Rt estimate for a short timeseries or as part of a wider analysis on the impact of interventions.

fixed_future_rt

Logical, defaults to FALSE. IF TRUE then the estimated Rt from the last time point with data is used for all future time points without data.

adapt_delta

Numeric, defaults to 0.99. See ?rstan::sampling.

max_treedepth

Numeric, defaults to 15. See ?rstan::sampling.

return_fit

Logical, defaults to FALSE. Should the fitted stan model be returned.

verbose

Logical, defaults to TRUE. Should verbose progress messages be printed.

debug

Logical, defaults to FALSE. Enables debug model in which additional diagnostics are available

Examples

Run this code

# NOT RUN {
# Get example case counts
reported_cases <- EpiNow2::example_confirmed[1:50]

# Add a dummy breakpoint (used only when optionally estimating breakpoints)
reported_cases <- reported_cases[, breakpoint := data.table::fifelse(date == as.Date("2020-03-16"),
                                                                     1, 0)]
# Set up example generation time
generation_time <- list(mean = EpiNow2::covid_generation_times[1, ]$mean,
                        mean_sd = EpiNow2::covid_generation_times[1, ]$mean_sd,
                        sd = EpiNow2::covid_generation_times[1, ]$sd,
                        sd_sd = EpiNow2::covid_generation_times[1, ]$sd_sd,
                        max = 30)
# Set delays between infection and case report 
# (any number of delays can be specifed here)             
incubation_period <- list(mean = EpiNow2::covid_incubation_period[1, ]$mean,
                          mean_sd = EpiNow2::covid_incubation_period[1, ]$mean_sd,
                          sd = EpiNow2::covid_incubation_period[1, ]$sd,
                          sd_sd = EpiNow2::covid_incubation_period[1, ]$sd_sd,
                          max = 30)
                   
reporting_delay <- list(mean = log(5),
                        mean_sd = log(2),
                        sd = log(2),
                        sd_sd = log(1.5),
                        max = 30)
                        
# Run model with default settings
def <- estimate_infections(reported_cases, family = "negbin",
                           generation_time = generation_time,
                           delays = list(incubation_period, reporting_delay),
                           samples = 1000, warmup = 200, cores = ifelse(interactive(), 4, 1),
                           chains = 4, estimate_rt = TRUE, verbose = FALSE, return_fit = TRUE)


# Plot output
plots <- report_plots(summarised_estimates = def$summarised,
                      reported = reported_cases)
                     
plots$summary

# Run model with Rt fixed into the future
fixed_rt <- estimate_infections(reported_cases, family = "negbin",
                                generation_time = generation_time,
                                delays = list(incubation_period, reporting_delay),
                                samples = 1000, warmup = 200, cores = ifelse(interactive(), 4, 1),
                                chains = 4, estimate_rt = TRUE, fixed_future_rt = TRUE,
                                return_fit = TRUE)


# Plot output
plots <- report_plots(summarised_estimates = fixed_rt$summarised,
                      reported = reported_cases)
                      
plots$summary

# Run the model with default settings on a later snapshot of 
# data (use burn_in here to remove the first week of
# estimates that may be impacted by this most).
snapshot_cases <- EpiNow2::example_confirmed[80:130]
snapshot <- estimate_infections(snapshot_cases, family = "negbin",
                                generation_time = generation_time,
                                delays = list(incubation_period, reporting_delay),
                                samples = 1000, warmup = 400, cores = ifelse(interactive(), 4, 1), 
                                chains = 4, estimate_rt = TRUE, verbose = FALSE, return_fit = TRUE,
                                burn_in = 7)


# Plot output
plots <- report_plots(summarised_estimates = snapshot$summarised,
                      reported = snapshot_cases)

plots$summary    

## Run model with stationary Rt assumption (likely to provide biased real-time estimates)
stat <- estimate_infections(reported_cases, family = "negbin",
                            generation_time = generation_time,
                            delays = list(incubation_period, reporting_delay),
                            samples = 1000, warmup = 200, cores = ifelse(interactive(), 4, 1),
                            chains = 4, estimate_rt = TRUE, stationary = TRUE,
                            verbose = FALSE, return_fit = TRUE)


# Plot output
plots <- report_plots(summarised_estimates = stat$summarised,
                      reported = reported_cases)

plots$summary
       
# Run model with fixed Rt assumption 
fixed <- estimate_infections(reported_cases, family = "negbin",
                            generation_time = generation_time,
                            delays = list(incubation_period, reporting_delay),
                            samples = 1000, warmup = 200, cores = ifelse(interactive(), 4, 1),
                            chains = 4, estimate_rt = TRUE, fixed = TRUE, 
                            verbose = FALSE, return_fit = TRUE)


# Plot output
plots <- report_plots(summarised_estimates = fixed$summarised,
                      reported = reported_cases)
                      
                      
plots$summary


# Run model with no delays 
no_delay <- estimate_infections(reported_cases, family = "negbin",
                            generation_time = generation_time,
                            samples = 1000, warmup = 200, 
                            cores = ifelse(interactive(), 4, 1),
                            chains = 4, estimate_rt = TRUE,
                            verbose = FALSE, return_fit = TRUE)


# Plot output
plots <- report_plots(summarised_estimates = no_delay$summarised,
                      reported = reported_cases)
                      
                      
plots$summary         
              
# Run model with breakpoints                                                                      
bkp <- estimate_infections(reported_cases, family = "negbin",
                           generation_time = generation_time,
                           delays = list(incubation_period, reporting_delay),
                           samples = 1000, warmup = 200, cores = ifelse(interactive(), 4, 1), 
                           chains = 4, estimate_rt = TRUE, estimate_breakpoints = TRUE, 
                           verbose = FALSE, return_fit = TRUE)


# Plot output
plots <- report_plots(summarised_estimates = bkp$summarised,
                      reported = reported_cases)

plots$summary
             
# Run model with breakpoints but with constrained non-linear change over time 
# This formulation may increase the apparent effect of the breakpoint but needs to be tested using
# model fit criteria (i.e LFO).                                                                    
cbkp <- estimate_infections(reported_cases, family = "negbin",
                            generation_time = generation_time,
                            gp = list(basis_prop = 0.3, boundary_scale = 2, 
                                     lengthscale_mean = 20, lengthscale_sd = 1),
                            delays = list(incubation_period, reporting_delay),
                            samples = 1000, warmup = 200, cores = ifelse(interactive(), 4, 1),
                            chains = 4, estimate_rt = TRUE, estimate_breakpoints = TRUE,
                            verbose = FALSE, return_fit = TRUE)


# Plot output
plots <- report_plots(summarised_estimates = cbkp$summarised,
             reported = reported_cases)
             
plots$summary

# Pull out breakpoint summary
cbkp$summarised[variable == "breakpoints"]

# Run model with breakpoints but otherwise static Rt
# This formulation may increase the apparent effect of the breakpoint but needs to be tested using
# model fit criteria (i.e LFO).                                                                    
fbkp <- estimate_infections(reported_cases, family = "negbin",
                            generation_time = generation_time,
                            delays = list(incubation_period, reporting_delay),
                            samples = 1000, warmup = 200, cores = ifelse(interactive(), 4, 1),
                            chains = 4, estimate_breakpoints = TRUE, fixed = TRUE, 
                            verbose = FALSE, return_fit = TRUE)


# Plot output
plots <- report_plots(summarised_estimates = fbkp$summarised,
             reported = reported_cases)
             
plots$summary

# Pull out breakpoint summary
fbkp$summarised[variable == "breakpoints"]

# Run model without Rt estimation (just backcalculation)
backcalc <- estimate_infections(reported_cases, family = "negbin",
                                generation_time = generation_time,
                                delays = list(incubation_period, reporting_delay),
                                samples = 1000, warmup = 200, cores = ifelse(interactive(), 4, 1),
                                chains = 4, estimate_rt = FALSE, verbose = FALSE, return_fit = TRUE)

 
# plot just infections as report_plots does not support the backcalculation only model
plot_estimates(estimate = backcalc$summarised[variable == "infections"],
               reported = reported_cases, ylab = "Cases")
# }

Run the code above in your browser using DataLab