Generates synthetic longitudinal data specifically designed to evaluate missing data imputation methods. The function creates a complex dataset with:
Time-varying covariates with autoregressive structures and random effects.
Non-linear relationships and interactions between covariates.
Mixed data types (continuous and binary/logical).
Non-normal Distributions (optional) for both random effects and residuals (Skew-t, t-distribution).
Missing Data Mechanisms:
Intermittent Missingness: Generated via logistic models conditioned on outcomes and other covariates.
Loss to Follow-up (LTFU): Simulates subject dropout starting from time point 4 based on values at time point 3.
simulation_imputation(NNY = TRUE, NNX = TRUE, n_subject = 1000, seed = NULL)A list containing the following components:
A data frame of the complete data (ground truth) without any missing values.
A data frame of the incomplete data, containing NAs introduced by intermittent missingness and dropout.
A duplicate of data_E used internally for generating missingness probabilities.
A matrix of random predictors (intercept and time slopes) used in generation.
A matrix summarizing the missing data pattern (generated via mice::md.pattern).
A logical value. If TRUE, the outcome Y is generated using non-normal distributions
(Skew-t random effects, t-distribution residuals). If FALSE, it uses standard Normal distributions.
Default: TRUE.
A logical value. If TRUE, the covariates X_7 through X_12 are generated using
non-normal distributions (Mixture models, Skew-t random effects). If FALSE, they use standard Normal distributions.
Default: TRUE.
An integer specifying the number of subjects. Default: 1000.
An optional integer for setting the random seed to ensure reproducibility. Default: NULL.
The simulation process creates 12 covariates (X_1 to X_12):
X_1 to X_6: Base covariates generated via multivariate normal distributions with autoregressive sigma. X_4, X_5, X_6 are converted to binary.
X_7 to X_12: Derived covariates dependent on the base set, involving non-linear transformations (squares, logs, interactions).
Missingness is introduced in two stages:
Intermittent Missingness: For variables X_7 to X_12, missingness indicators are drawn from Bernoulli distributions where the probability depends on the outcome Y and other covariates.
Dropout: A "Loss to Follow-up" indicator is generated based on data at time point 3. If a subject drops out, all values for time points 4 and 5 become NA.
# Simulate data with non-normal errors and random effects
sim_data <- simulation_imputation(NNY = TRUE, NNX = TRUE, n_subject = 10, seed = 123)
# View missing data pattern
sim_data$pair
Run the code above in your browser using DataLab