The simulation of the confounded survival data has four main steps: (1) Generation of covariates, (2) Assigning the treatment variable, (3) Generating survival times and (4) introducing censoring.
First, covariates are generated by taking independent n
random samples from the distributions defined in lcovars
.
In the second step the generated covariates are used to estimate the probability of receiving treatment (the propensity score) for each simulated person in the dataset. This is done using a logistic regression model, using the values in treatment_betas
as coefficients and interecept
as the intercept. By changing the intercept, the user can vary the proportion of cases that end up in each treatment group on average. The estimated probabilities are then used to generate the treatment variable ("group"), making the treatment assignment dependent on the covariates.
Next, survival times are generated based on the method described in Bender et al. (2005) using the causal coefficients defined in outcome_betas
and group_beta
. Both the independently generated covariates and the covariate-dependent treatment variable are used in this step. This introduces confounding.
Independent right-censoring is introduced by taking n
independent random draws from some distribution defined by cens_fun
and censoring every individual whose censoring time is smaller than its simulated survival time. The whole process is based on work from Chatton et al. (2020).
Currently only supports binary treatments and does not allow dependent censoring.