This is a function that fits an incidence curve to a set of reported cases and delay distribution using an empirical Bayes estimation method, which fits parameters for a spline basis. All hyper parameter tuning and data processing are done within this function.
fit_incidence(
reported,
delay_dist,
dof_grid = seq(6, 20, 2),
dof_method = "aic",
lam_grid = 10^(seq(-1, -8, length.out = 20)),
lam_method = "val",
percent_thresh = 2,
regularization_order = 2,
num_ar_steps = 10,
num_ar_samps = 100,
linear_tail = 14,
front_pad_size = 10,
extrapolation_prior_precision = 10,
frac_train = 0.75,
fisher_approx_cov = TRUE,
end_pad_size = 50,
num_samps_per_ar = 10,
val_restarts = 2,
seed = 1
)
An integer vector of reported cases.
A positive vector that sums to one, which describes the delay distribution.
An integer vector of degrees of freedom for the spline basis.
Metric to choose "best" spline degrees of freedom: 'aic': Akaike information criterion, 'bic': Bayesian information criterion, 'val': validation likelihood.
A vector of regularization strengths to scan.
metric to choose "best" regularization strength lambda: 'aic': Akaike information criterion, 'bic': Bayesian information criterion, 'val': validation likelihood.
If using validation likelihood to select best, the largest (strongest) lambda that is within `percent_thresh` of the highest validation lambda will be selected. Default is 2. Must be greater than 0.
An integer (typically 0, 1, 2), indicating differencing order for L2 regularization of spline parameters. Default is 2 for second derivative penalty.
An integer number of AR steps after last observation.
An integer number of AR samples.
An integer number of days used to fit linear model on tail to be used as a mean for AR extrapolation.
An integer for initial number of 0's before first observation.
A positive scalar for extrapolation slope shrinkage prior precision.
A numeric between 0 and 1 for fraction of data used to train lambda validation.
A flag to use either the Fisher Information (TRUE) or the Hessian (FALSE) to approx posterior covariance over parameters.
And integer number of steps the spline is defined beyond the final observation.
An integer for the number of Laplace samples per AR fit.
An integer for the number of times to refit hyperparameters if 'val' is used for either. Set to 1 for faster but more unstable fits.
Seed for RNG.
A list with the following entries:
Isamps -- sample of the incidence curve from a Laplace approximation per AR sample;
Ihat -- MAP incidence curve estimate;
Chat -- expected cases given MAP incidence curve estimate;
beta_hats -- matrix of beta's per AR sample;
best_dof -- best degrees of freedom from tuning;
best_lambda -- best regularization parameter from tuning; and
reported -- a copy of reported values used for fitting.
# NOT RUN {
indiana_model <- fit_incidence(
reported = spanish_flu$Indiana,
delay_dist = spanish_flu_delay_dist$proportion)
# }
Run the code above in your browser using DataCamp Workspace