Learn R Programming

baselinenowcast (version 0.2.0)

baselinenowcast.data.frame: Create a dataframe of nowcast results from a dataframe of cases indexed by reference date and report date

Description

This function ingests a data.frame with the number of incident cases indexed by reference date and report date for one or multiple strata, which define the unit of a single nowcast (e.g. age groups or locations). It returns a data.frame containing nowcasts by reference date for each strata, which are by default estimated independently. This function will by default estimate uncertainty using past retrospective nowcast errors and generate probabilistic nowcasts, which are samples from the predictive distribution of the estimated final case count at each reference date.

This function implements the full nowcasting workflow on multiple reporting triangles, generating estimates of the delay and uncertainty parameters for all strata using estimates from across strata if specified.

  1. estimate_delay() - Estimate a delay PMF across strata if strata_sharing contains "delay"

  2. estimate_uncertainty_retro() - Estimates uncertainty parameters across strata if strata_sharing contains "uncertainty"

  3. as_reporting_triangle() - Generates a reporting triangle object from a data.frame

  4. baselinenowcast.reporting_triangle() - Generates point or probabilistic nowcasts depending on output_type for each strata.

@detail See documentation for the arguments of this function which can be used to set the model specifications (things like number of reference times for delay and uncertainty estimation, the observation model, etc.). The function expects that each strata in the dataframe has the same maximum delay. If sharing estimates across all strata, the shared estimates will be made using the shared set of reference and report dates across strata.

Usage

# S3 method for data.frame
baselinenowcast(
  data,
  scale_factor = 3,
  prop_delay = 0.5,
  output_type = c("samples", "point"),
  draws = 1000,
  uncertainty_model = fit_by_horizon,
  uncertainty_sampler = sample_nb,
  max_delay = NULL,
  delays_unit = "days",
  strata_cols = NULL,
  strata_sharing = "none",
  preprocess = preprocess_negative_values,
  ...
)

Value

Data.frame of class baselinenowcast_df

Arguments

data

Data.frame in a long tidy format with counts by reference date and report date for one or more strata. Must contain the following columns: - reference_date: Column of type Date containing the dates of the primary event occurrence.

  • report_date: Column of type Date containing the dates of report of the primary event.

  • count: Column of numeric or integer indicating the new confirmed counts pertaining to that reference and report date. Additional columns indicating the columns which set the unit of a single can be included. The user can specify these columns with the strata_cols argument, otherwise it will be assumed that the data contains only data for a single strata.

scale_factor

Numeric value indicating the multiplicative factor on the maximum delay to be used for estimation of delay and uncertainty. Default is 3.

prop_delay

Numeric value <1 indicating what proportion of all reference times in the reporting triangle to be used for delay estimation. Default is 0.5.

output_type

Character string indicating whether the output should be samples ("samples") from the estimate with full uncertainty or whether to return the point estimate ("point"). Default is "samples". If "point"estimates are specified, the minimum number of reference times needed is the number needed for delay estimation, otherwise, if "samples" are specified, at least 2 additional reference times are required for uncertainty estimation.

draws

Integer indicating the number of probabilistic draws to include if output_type is "samples". Default is 1000.

uncertainty_model

Function that ingests a matrix of observations and a matrix of predictions and returns a vector that can be used to apply uncertainty using the same error model. Default is fit_by_horizon with arguments of obs matrix of observations and pred the matrix of predictions that fits each column (horizon) to a negative binomial observation model by default. The user can specify a different fitting model by replacing the fit_model argument in fit_by_horizon.

uncertainty_sampler

Function that ingests a vector or matrix of predictions and a vector of uncertainty parameters and generates draws from the observation model. Default is sample_nb which expects arguments pred for the vector of predictions and uncertainty parameters for the corresponding vector of uncertainty parameters, and draws from a negative binomial for each element of the vector.

max_delay

Maximum delay (in units of delays_unit) to include in the nowcast. If NULL (default), all delays in the data are used. If specified, only observations with delay <= max_delay are included.

delays_unit

Character string specifying the temporal granularity of the delays. Options are "days", "weeks", "months", "years". Default is "days".

strata_cols

Vector of character strings indicating the names of the columns in data that determine how to stratify the data for nowcasting. The unique combinations of the entries in the strata_cols denote the unit of a single nowcast. Within a strata, there can be no repeated unique combinations of reference dates and report dates. Default is NULL which assumes that the data.frame being passed in represents a single strata (only one nowcast will be produced). All columns that are not part of the strata_cols will be removed.

strata_sharing

Vector of character strings. Indicates if and what estimates should be shared for different nowcasting steps. Options are "none" for no sharing (each strata_cols is fully independent), "delay" for delay sharing and "uncertainty" for uncertainty sharing. Both "delay" and "uncertainty" can be passed at the same time.

preprocess

Function to apply to the reporting triangle before estimation, or NULL to skip preprocessing. Default is preprocess_negative_values(), which handles negative values by redistributing them to earlier delays. Set to NULL if you want to preserve negative values. Custom preprocess functions must accept a validate parameter (defaults to TRUE) to enable validation optimisation in internal function chains.

...

Additional arguments passed to estimate_uncertainty() and sample_nowcast().

See Also

Main nowcasting interface functions assert_baselinenowcast_df(), baselinenowcast(), baselinenowcast.reporting_triangle(), baselinenowcast_df-class, new_baselinenowcast_df()

Examples

Run this code
# Filter data to exclude most recent report dates and limit to 75
# reference dates
max_ref_date <- max(germany_covid19_hosp$reference_date)
min_ref_date <- max_ref_date - 74
covid_data_to_nowcast <- germany_covid19_hosp[
  germany_covid19_hosp$report_date < max_ref_date &
    germany_covid19_hosp$reference_date >= min_ref_date,
]
nowcasts_df <- baselinenowcast(covid_data_to_nowcast,
  max_delay = 25,
  strata_cols = c("age_group", "location"),
  draws = 100
)
nowcasts_df

Run the code above in your browser using DataLab