load_expanded_data: Method to read, subset and sample expanded data

Description

Usage

load_expanded_data(
  object,
  p_control = NULL,
  period = NULL,
  subset_condition = NULL,
  seed = NULL
)
# S4 method for trial_sequence
load_expanded_data(
  object,
  p_control = NULL,
  period = NULL,
  subset_condition = NULL,
  seed = NULL
)

Value

An updated trial_sequence object, the data is stored in slot @outcome_data

as a te_outcome_data object.

Arguments

object

An object of class trial_sequence.

p_control

Probability of selecting a control, NULL for no sampling (default).

period

An integerish vector of non-zero length to select trial period(s) or NULL (default) to select all trial periods.

subset_condition

A string or NULL (default). subset_condition will be translated to a call (in case the expanded data is saved as a data.table or in the csv format) or to a SQL-query (in case the expanded data is saved as a duckdb file).

The operators "==", "!=", ">", ">=", "<", "<=", %in%", "&", "|" are supported. Numeric vectors can be written as c(1, 2, 3) or 1:3. Variables are not supported.

Note: Make sure numeric vectors written as 1:3 are surrounded by spaces, e.g. a %in% c( 1:4 , 6:9 ), otherwise the code will fail.

seed

An integer seed or NULL (default).

Note: The same seed will return a different result depending on the class of the te_datastore object contained in the trial_sequence object.

Details

This method is used on trial_sequence objects to read, subset and sample expanded data.

Examples

Run this code

# create a trial_sequence-class object
trial_itt_dir <- file.path(tempdir(), "trial_itt")
dir.create(trial_itt_dir)
trial_itt <- trial_sequence(estimand = "ITT") |>
  set_data(data = data_censored) |>
  set_outcome_model(adjustment_terms = ~ x1 + x2)

trial_itt_csv <- set_expansion_options(
  trial_itt,
  output = save_to_csv(file.path(trial_itt_dir, "trial_csvs")),
  chunk_size = 500
) |>
  expand_trials()

# load_expanded_data default behaviour returns all trial_periods and doesn't sample
load_expanded_data(trial_itt_csv)

# load_expanded_data can subset the data before sampling
load_expanded_data(
  trial_itt_csv,
  p_control = 0.2,
  period = 1:20,
  subset_condition = "followup_time %in% 1:20 & x2 < 1",
)

# delete after use
unlink(trial_itt_dir, recursive = TRUE)

Run the code above in your browser using DataLab