Learn R Programming

epidm (version 1.0.6)

proxy_episode_dates: Clean and Impute HES/SUS Episode Start and End Dates

Description

[Stable]

A utility for cleaning and imputing missing or inconsistent episode end dates in HES/SUS–style inpatient data. The function identifies missing, invalid, or overlapping spell dates within patient/provider groups and applies deterministic rules to correct them. It also assigns a flag (proxy_missing) indicating whether a value was modified and why.

Usage

proxy_episode_dates(
  x,
  group_vars,
  spell_start_date,
  spell_end_date,
  discharge_destination,
  .dropTmp = TRUE,
  .forceCopy = FALSE
)

Value

A data.table containing:

  • Cleaned spell start and end dates.

  • A flag variable (proxy_missing) indicating whether a date was modified and the rule applied (0–4).

Arguments

x

A data.frame or data.table. Will be converted to a data.table if not already.

group_vars

Character vector of grouping variables (e.g., patient ID, provider). At least one identifier must be supplied.

spell_start_date

Name of the column containing the episode or spell start date.

spell_end_date

Name of the column containing the episode or spell end date.

discharge_destination

Name of the column containing the CDS discharge destination code.

.dropTmp

Logical (default TRUE). If TRUE, temporary processing columns are removed before returning the result.

.forceCopy

Logical (default FALSE). If FALSE, the input is converted to a data.table and modified by reference. If TRUE, the input must already be a data.table, and the function will create an explicit copy to avoid modifying the original object.

Examples

Run this code

proxy_test <- data.frame(
  id = c(
    rep(3051, 4),
    rep(7835,3),
    rep(9891,3),
    rep(1236,3)
  ),
  provider = c(
    rep("QKJ", 4),
    rep("JSD",3),
    rep("YJG",3),
    rep("LJG",3)
  ),
  spell_start = as.Date(c(
    "2020-07-03", "2020-07-14", "2020-07-23", "2020-08-05",
    "2020-11-01", "2020-11-13", "2020-12-01",
    "2020-03-28", "2020-04-06", "2020-04-09",
    "2020-10-06", "2020-11-05", "2020-12-25"
  )),
  spell_end = as.Date(c(
    "2020-07-11", "2020-07-22", "2020-07-30", "2020-07-30",
    "2020-11-11", NA, "2020-12-03",
    "2020-03-28", NA, "2020-04-09",
    "2020-10-06", "2020-11-05", NA
  )),
  disdest = c(
    19, 19, 51, 19,
    19, 19, 19,
    51, 98, 19,
    19, 19, 98
  )
)


proxy_episode_dates(
  x=proxy_test,
  group_vars = c('id','provider'),
  spell_start_date = 'spell_start',
  spell_end_date = 'spell_end',
  discharge_destination = 'disdest'
)[]

Run the code above in your browser using DataLab