This function ingests a data.frame with the number of incident cases indexed by reference date and report date for one or multiple strata, which define the unit of a single nowcast (e.g. age groups or locations). It returns a data.frame containing nowcasts by reference date for each strata, which are by default estimated independently. This function will by default estimate uncertainty using past retrospective nowcast errors and generate probabilistic nowcasts, which are samples from the predictive distribution of the estimated final case count at each reference date.
This function implements the full nowcasting workflow on multiple reporting triangles, generating estimates of the delay and uncertainty parameters for all strata using estimates from across strata if specified.
estimate_delay() - Estimate a delay PMF across strata if
strata_sharing contains "delay"
estimate_uncertainty_retro() - Estimates uncertainty parameters
across strata if strata_sharing contains "uncertainty"
as_reporting_triangle() - Generates a reporting triangle object
from a data.frame
baselinenowcast.reporting_triangle() - Generates point or
probabilistic nowcasts depending on output_type for each strata.
@detail See documentation for the arguments of this function which can be used to set the model specifications (things like number of reference times for delay and uncertainty estimation, the observation model, etc.). The function expects that each strata in the dataframe has the same maximum delay. If sharing estimates across all strata, the shared estimates will be made using the shared set of reference and report dates across strata.
# S3 method for data.frame
baselinenowcast(
data,
scale_factor = 3,
prop_delay = 0.5,
output_type = c("samples", "point"),
draws = 1000,
uncertainty_model = fit_by_horizon,
uncertainty_sampler = sample_nb,
max_delay = NULL,
delays_unit = "days",
strata_cols = NULL,
strata_sharing = "none",
preprocess = preprocess_negative_values,
...
)Data.frame of class baselinenowcast_df
Data.frame in a long tidy format with counts by reference date
and report date for one or more strata. Must contain the following
columns:
- reference_date: Column of type Date containing the dates
of the primary event occurrence.
report_date: Column of type Date containing the dates of
report of the primary event.
count: Column of numeric or integer indicating the new confirmed
counts pertaining to that reference and report date.
Additional columns indicating the columns which set the unit of a single
can be included. The user can specify these columns with the
strata_cols argument, otherwise it will be assumed that the data
contains only data for a single strata.
Numeric value indicating the multiplicative factor on
the maximum delay to be used for estimation of delay and uncertainty.
Default is 3.
Numeric value <1 indicating what proportion of all
reference times in the reporting triangle to be used for delay
estimation. Default is 0.5.
Character string indicating whether the output should be
samples ("samples") from the estimate with full uncertainty or whether to
return the point estimate ("point"). Default is "samples". If
"point"estimates are specified, the minimum number of reference times
needed is the number needed for delay estimation, otherwise, if
"samples" are specified, at least 2 additional reference times are
required for uncertainty estimation.
Integer indicating the number of probabilistic draws to include
if output_type is "samples". Default is 1000.
Function that ingests a matrix of observations and a
matrix of predictions and returns a vector that can be used to
apply uncertainty using the same error model. Default is
fit_by_horizon with arguments of obs matrix of observations and
pred the matrix of predictions that fits each column (horizon)
to a negative binomial observation model by default. The user can
specify a different fitting model by replacing the
fit_model argument in fit_by_horizon.
Function that ingests a vector or matrix of
predictions and a vector of uncertainty parameters and generates draws
from the observation model. Default is sample_nb which expects
arguments pred for the vector of predictions and uncertainty parameters
for the corresponding vector of uncertainty parameters, and draws from a
negative binomial for each element of the vector.
Maximum delay (in units of delays_unit) to include in the
nowcast. If NULL (default), all delays in the data are used. If specified,
only observations with delay <= max_delay are included.
Character string specifying the temporal granularity of
the delays. Options are "days", "weeks", "months", "years".
Default is "days".
Vector of character strings indicating the names of the
columns in data that determine how to stratify the data for nowcasting.
The unique combinations of the entries in the strata_cols denote the
unit of a single nowcast. Within a strata, there can be no repeated
unique combinations of reference dates and report dates. Default is NULL
which assumes that the data.frame being passed in represents a single
strata (only one nowcast will be produced). All columns that are not
part of the strata_cols will be removed.
Vector of character strings. Indicates if and what
estimates should be shared for different nowcasting steps. Options are
"none" for no sharing (each strata_cols is fully independent),
"delay" for delay sharing and "uncertainty" for uncertainty sharing.
Both "delay" and "uncertainty" can be passed at the same time.
Function to apply to the reporting triangle before
estimation, or NULL to skip preprocessing. Default is
preprocess_negative_values(), which handles negative values by
redistributing them to earlier delays. Set to NULL if you want to preserve
negative values. Custom preprocess functions must accept a validate
parameter (defaults to TRUE) to enable validation optimisation in internal
function chains.
Additional arguments passed to
estimate_uncertainty()
and sample_nowcast().
Main nowcasting interface functions
assert_baselinenowcast_df(),
baselinenowcast(),
baselinenowcast.reporting_triangle(),
baselinenowcast_df-class,
new_baselinenowcast_df()
# Filter data to exclude most recent report dates and limit to 75
# reference dates
max_ref_date <- max(germany_covid19_hosp$reference_date)
min_ref_date <- max_ref_date - 74
covid_data_to_nowcast <- germany_covid19_hosp[
germany_covid19_hosp$report_date < max_ref_date &
germany_covid19_hosp$reference_date >= min_ref_date,
]
nowcasts_df <- baselinenowcast(covid_data_to_nowcast,
max_delay = 25,
strata_cols = c("age_group", "location"),
draws = 100
)
nowcasts_df
Run the code above in your browser using DataLab