RealSurvSim: Simulate Datasets Using Various Simulation Models

Description

Simulates survival datasets(Time-to-event data) based on original or reconstructed data using four different simulation models: Kernel Density Estimation (KDE), parametric distributions, conditional bootstrap, and Case Resampling. This function is designed to support comprehensive survival analysis simulations.

Usage

RealSurvSim(
  dat,
  col_time,
  col_status,
  col_group,
  reps = 10000,
  random_seed = 123,
  n = NULL,
  simul_type = c("cond", "case", "distr", "KDE"),
  distribs = c("exp", "exp", "exp", "exp")
)

Value

A list containing the simulated datasets for each specified simulation model. The structure of the output list is as follows:

     - {datasets}: A list of data frames, where each data frame represents a simulated dataset.
     - Each data frame contains:
         - {V1}: A numeric vector representing the simulated time-to-event data.
         - {V2}: A numeric or integer vector indicating the status, representing
           whether the event of interest has occurred (1) or is censored (0).
         - {V3}: An integer vector representing group.
     - The number of data frames within {datasets} corresponds to the number of repetitions specified
       by the {reps} parameter.

Arguments

dat

A data.frame representing the original or reconstructed dataset for simulation. The dataset must include three columns: for event times, for censoring status, and for group identifiers.

col_time

The name or index of the column in dat representing time to event.

col_status

The name or index of the column in dat representing the event status (1 for event occurred, 0 for censored).

col_group

The name or index of the column in dat representing group assignments.

reps

The number of iterations, equivalent to the number of datasets simulated for each simulation model. Defaults to 10000.

random_seed

Seed for random number generation to ensure reproducibility. Defaults to 123.

n

An optional numeric vector specifying the number of observations to simulate for each group. If NULL, the function uses the original dataset's group sizes for simulation. For all simulation types except "conditional bootstrap," n can be set to arbitrary values, such as c(50, 60), where each element specifies the number of observations for a group. Defaults to NULL.

simul_type

A vector of characters specifying the types of simulation to perform. It includes "cond" (conditional bootstrap), "case" (case resampling), "distr" (parametric distributions), and "KDE" (kernel density estimation, supports all kernels available in the kdensity function. Refer to 'kdensity'). Note: Only one simulation type can be used at a time.

distribs

Character vector of length 4, one distribution per stratum. Must be one of:

"inverse_gamma"
"llogis"
"gumbel"
"exp"
"gamma"
"normal"
"cauchy"

Defaults to c("exp", "exp", "exp", "exp").

Examples

Run this code

# liang should have columns: V1 (time), V2 (status), V3 (group)

         # Simulate data using parametric distribution fitting

          liang<- dats$Liang
          liang_distr <- RealSurvSim(
           dat = liang,
           col_time = "V1",
           col_status = "V2",
           col_group = "V3",
           reps = 10,
           simul_type = "distr",
           distribs = c("exp", "exp", "exp", "exp")
         )

Run the code above in your browser using DataLab