episodes: Track episodes for case definitions and record deduplication.

Description

Link events into a chronological sequence of episodes.

Usage

episodes(
  date,
  case_length = Inf,
  episode_type = "fixed",
  recurrence_length = case_length,
  episode_unit = "days",
  episodes_max = Inf,
  rolls_max = Inf,
  overlap_methods_c = "overlap",
  overlap_methods_r = overlap_methods_c,
  sn = NULL,
  strata = NULL,
  skip_if_b4_lengths = FALSE,
  data_source = NULL,
  data_links = "ANY",
  custom_sort = NULL,
  skip_order = Inf,
  recurrence_from_last = TRUE,
  case_for_recurrence = FALSE,
  from_last = FALSE,
  group_stats = FALSE,
  display = "none"
)
fixed_episodes(
  date,
  case_length = Inf,
  episode_unit = "days",
  to_s4 = TRUE,
  overlap_methods_c = "overlap",
  deduplicate = FALSE,
  display = "progress",
  bi_direction = FALSE,
  recurrence_length = case_length,
  overlap_methods_r = overlap_methods_c,
  include_index_period = TRUE,
  ...,
  overlap_methods = "overlap",
  overlap_method = "overlap",
  x
)
rolling_episodes(
  date,
  case_length = Inf,
  recurrence_length = case_length,
  episode_unit = "days",
  to_s4 = TRUE,
  overlap_methods_c = "overlap",
  overlap_methods_r = overlap_methods_c,
  deduplicate = FALSE,
  display = "progress",
  bi_direction = FALSE,
  include_index_period = TRUE,
  ...,
  overlap_methods = "overlap",
  overlap_method = "overlap",
  x
)
episode_group(df, ..., episode_type = "fixed")

Arguments

date

Event date (date, datetime or numeric) or period (number_line).

case_length

Cut-off point (numeric) or period (number_line), distinguishing one "case" from another. This is the case window.

episode_type

"fixed" or "rolling".

recurrence_length

Cut-off point or period distinguishing a "recurrent" event from its index "case". This is the recurrence window. By default, it's the same as case_length.

episode_unit

Time units for case_length and recurrence_length. Options are "seconds", "minutes", "hours", "days", "weeks", "months" or "years". See diyar::episode_unit.

episodes_max

The maximum number of episodes permitted within each strata.

rolls_max

Maximum number of times the index "case" can recur. Only used if episode_type is "rolling".

overlap_methods_c

Methods of overlap considered when tracking duplicates of "case" events. See (overlaps)

overlap_methods_r

Methods of overlap considered when tracking duplicates of "recurrent" events. See (overlaps)

Unique numerical record identifier. Useful for creating familiar episode identifiers.

strata

Subsets. Episodes are tracked separately within each subset. links is useful for creating these.

skip_if_b4_lengths

If TRUE (default), events before the cut-off points or periods are skipped.

data_source

Unique data source identifier. Useful when the dataset has data from multiple sources.

data_links

A set of data_sources required in each episode. A strata without records from these data sources will be skipped, and episodes without these will be unlinked. See Details.

custom_sort

Preferential order for selecting index ("case") events. Required for tracking episodes in a non-chronological sequence.

skip_order

"nth" level of custom_sort. Episodes with index events beyond this level of preference are skipped.

recurrence_from_last

If TRUE (default), the reference event for a recurrence window will be the last event from the previous window. If FALSE (default), it will be the first event. Only used if episode_type is "rolling".

case_for_recurrence

If TRUE, both "case" and "recurrent" events will have a case window. If FALSE (default), only case events will have a case window. Only used if episode_type is "rolling".

from_last

Chronological sequence of episode tracking. Ascending (TRUE) or descending TRUE.

group_stats

If TRUE (default), episode-specific information like episode start and endpoints are returned. See Value.

display

The messages printed on screen. Options are; "none" (default) or, "progress" and "stats" for a progress update or a more detailed breakdown of the tracking process.

to_s4

Data type of returned object. epid (TRUE) or data.frame (FALSE).

deduplicate

if TRUE, "duplicate" events are excluded from the output.

bi_direction

If TRUE, "duplicate" events before and after the index event are tracked.

include_index_period

If TRUE, overlaps with the index event or period are linked even if they are outside the cut-off period.

...

Arguments passed to episodes

overlap_methods

Deprecated. Please use overlap_methods_c or overlap_methods_r. Methods of overlap considered when tracking duplicate event. See (overlaps)

overlap_method

Deprecated. Please use overlap_methods_c or overlap_methods_r. Methods of overlap considered when tracking event. All event are checked by the same set of overlap_method.

Deprecated. Record date or period. Please use date

data.frame. One or more datasets appended together. See Details.

Value

epid objects or data.frame if to_s4 is FALSE

sn - unique record identifier as provided (or generated)
epid | .Data - unique episode identifier
wind_id - unique window identifier
wind_nm - type of window i.e. "Case" or "Recurrence"
case_nm - record type in regards to case assignment
dist_from_wind - duration of each event from its window's reference event
dist_from_epid - duration of each event from its episode's reference event
epid_dataset - data sources in each episode
epid_interval - episode start and end dates. A number_line object.
epid_length - the difference between episode start and end dates (difftime). If possible, it's the same unit as episode_unit otherwise, a difference in days is returned
epid_total - number of records in each episode
iteration - iteration of the process when each event was tracked to its episode.

Details

Episodes are tracked from index events in chronological sequence as determined by from_last. You can use custom_sort for a non-chronological sequence. However, ties will be broken by chronological orders.

A "fixed" episode has a fixed maximum duration determined by case_length. But a "rolling" episode can continue to recur. therefore, its maximum duration is variable. A "rolling" episode will persist as long as is specified by rolls_max.

episodes() will categorise records into 5 types of events;

"Case" - Index case of the episode.
"Duplicate_C" - Duplicate of the index case.
"Recurrent" - Recurrent event of the index case.
"Duplicate_R" - Duplicate of the recurrent event.
"Skipped" - Those skipped from the episode tracking process.

data_source - including this populates the epid_dataset slot. See Value.

data_links should be a list of atomic vectors with every element named "l" (links) or "g" (groups).

"l" - Episodes with records from every listed data source will be retained.
"g" - Episodes with records from any listed data source will be retained.

data_links and skip_order are useful for skipping episodes that are not required to minimise processing time.

episode_group() as it existed before v0.2.0 has been retired. Its now exists to support previous code with minimal disruption. Please use episodes() moving forward.

rolling_episodes() and rolling_episodes() are wrapper functions for tracking "fixed" and "rolling" episodes respectively. They exist for convenience, to support previous code and arguments with minimal disruption.

See vignette("episodes") for more information.

Examples

Run this code

# NOT RUN {
library(diyar)
data(infections)
data(hospital_admissions)

db_1 <- infections
db_1$patient_id <- c(rep("PID 1",8), rep("PID 2",3))

# Fixed episodes
# One 16-day (15-day difference) episode per patient
db_1$epids_p <- episodes(date = db_1$date,
                         strata = db_1$patient_id,
                         case_length = 15,
                         episodes_max = 1)
# Rolling episodes
# 16-day episodes with recurrence periods of 11 days
db_1$rd_b <- episodes(date = db_1$date,
                     case_length = 15,
                     recurrence_length = 10,
                     episode_type = "rolling")

# Interval grouping
hospital_admissions$admin_period <- number_line(hospital_admissions$admin_dt,
                                                hospital_admissions$discharge_dt)
admissions <- hospital_admissions[c("admin_period","epi_len")]

# Episodes of overlapping periods of admission
hospital_admissions$epids_i<- episodes(date = hospital_admissions$admin_period,
                                       case_length = 0,
                                       overlap_methods_c = "inbetween")

# }

Run the code above in your browser using DataLab