fit_incidence: Fit incidence curve to reported data

Description

This is a function that fits an incidence curve to a set of reported cases and delay distribution using an empirical Bayes estimation method, which fits parameters for a spline basis. All hyper parameter tuning and data processing are done within this function.

Usage

fit_incidence(
  reported,
  delay_dist,
  dof_grid = seq(6, 20, 2),
  dof_method = "aic",
  lam_grid = 10^(seq(-1, -8, length.out = 20)),
  lam_method = "val",
  percent_thresh = 2,
  regularization_order = 2,
  num_ar_steps = 10,
  num_ar_samps = 100,
  linear_tail = 14,
  front_pad_size = 10,
  extrapolation_prior_precision = 10,
  frac_train = 0.75,
  fisher_approx_cov = TRUE,
  end_pad_size = 50,
  num_samps_per_ar = 10,
  val_restarts = 2,
  seed = 1
)

Arguments

reported

An integer vector of reported cases.

delay_dist

A positive vector that sums to one, which describes the delay distribution.

dof_grid

An integer vector of degrees of freedom for the spline basis.

dof_method

Metric to choose "best" spline degrees of freedom: 'aic': Akaike information criterion, 'bic': Bayesian information criterion, 'val': validation likelihood.

lam_grid

A vector of regularization strengths to scan.

lam_method

metric to choose "best" regularization strength lambda: 'aic': Akaike information criterion, 'bic': Bayesian information criterion, 'val': validation likelihood.

percent_thresh

If using validation likelihood to select best, the largest (strongest) lambda that is within `percent_thresh` of the highest validation lambda will be selected. Default is 2. Must be greater than 0.

regularization_order

An integer (typically 0, 1, 2), indicating differencing order for L2 regularization of spline parameters. Default is 2 for second derivative penalty.

num_ar_steps

An integer number of AR steps after last observation.

num_ar_samps

An integer number of AR samples.

linear_tail

An integer number of days used to fit linear model on tail to be used as a mean for AR extrapolation.

front_pad_size

An integer for initial number of 0's before first observation.

extrapolation_prior_precision

A positive scalar for extrapolation slope shrinkage prior precision.

frac_train

A numeric between 0 and 1 for fraction of data used to train lambda validation.

fisher_approx_cov

A flag to use either the Fisher Information (TRUE) or the Hessian (FALSE) to approx posterior covariance over parameters.

end_pad_size

And integer number of steps the spline is defined beyond the final observation.

num_samps_per_ar

An integer for the number of Laplace samples per AR fit.

val_restarts

An integer for the number of times to refit hyperparameters if 'val' is used for either. Set to 1 for faster but more unstable fits.

seed

Seed for RNG.

Value

A list with the following entries:

Isamps -- sample of the incidence curve from a Laplace approximation per AR sample;
Ihat -- MAP incidence curve estimate;
Chat -- expected cases given MAP incidence curve estimate;
beta_hats -- matrix of beta's per AR sample;
best_dof -- best degrees of freedom from tuning;
best_lambda -- best regularization parameter from tuning; and
reported -- a copy of reported values used for fitting.

Examples

Run this code

# NOT RUN {
indiana_model <- fit_incidence(
                  reported = spanish_flu$Indiana, 
                  delay_dist = spanish_flu_delay_dist$proportion) 
# }

Run the code above in your browser using DataLab