Generates synthetic longitudinal data specifically designed to stress-test imputation methods against Loss to Follow-up (Dropout). While it includes intermittent missingness, the parameters are tuned to simulate scenarios where subjects permanently leave the study based on their characteristics at specific time points.
simulation_imputation_LTFU(
NNY = TRUE,
NNX = TRUE,
n_subject = 1000,
seed = NULL
)A list containing the following components:
A data frame of the complete data (ground truth) without any missing values.
A data frame of the incomplete data, containing NAs introduced by intermittent missingness and significant LTFU.
A duplicate of data_E used internally for generating missingness probabilities.
A matrix of random predictors (intercept and time slopes) used in generation.
A matrix summarizing the missing data pattern (generated via mice::md.pattern).
A logical value. If TRUE, the outcome Y is generated using non-normal distributions
(Skew-t random effects, t-distribution residuals). If FALSE, it uses standard Normal distributions.
Default: TRUE.
A logical value. If TRUE, the covariates X_7 through X_12 are generated using
non-normal distributions (Mixture models, Skew-t random effects). If FALSE, they use standard Normal distributions.
Default: TRUE.
An integer specifying the number of subjects. Default: 1000.
An optional integer for setting the random seed to ensure reproducibility. Default: NULL.
The data generation process mirrors simulation_imputation regarding covariate structure (time-varying, non-linear, mixed types),
but utilizes specific coefficients to drive the missingness mechanisms:
1. Loss to Follow-up (LTFU): Dropout is simulated based on the subject's state at time point 3. A logistic model determines the probability of dropout using:
The outcome Y at time 3.
Covariates X_1, X_2, and X_3 at time 3.
If a subject is selected for LTFU, all their observations for time points 4 and 5 are set to NA.
2. Intermittent Missingness:
Variable-specific missingness is applied to X_7 through X_12 using logistic models that depend on the concurrent outcome Y,
other covariates, and the previous value of the variable itself (autoregressive missingness).
lt_data <- simulation_imputation_LTFU(NNY = TRUE, NNX = TRUE, n_subject = 10, seed = 42)
Run the code above in your browser using DataLab