Learn R Programming

SBMTrees (version 1.4)

simulation_imputation: Simulate Longitudinal Data with Missing Values for Imputation

Description

Generates synthetic longitudinal data specifically designed to evaluate missing data imputation methods. The function creates a complex dataset with:

  • Time-varying covariates with autoregressive structures and random effects.

  • Non-linear relationships and interactions between covariates.

  • Mixed data types (continuous and binary/logical).

  • Non-normal Distributions (optional) for both random effects and residuals (Skew-t, t-distribution).

  • Missing Data Mechanisms:

    • Intermittent Missingness: Generated via logistic models conditioned on outcomes and other covariates.

    • Loss to Follow-up (LTFU): Simulates subject dropout starting from time point 4 based on values at time point 3.

Usage

simulation_imputation(NNY = TRUE, NNX = TRUE, n_subject = 1000, seed = NULL)

Value

A list containing the following components:

data_E

A data frame of the complete data (ground truth) without any missing values.

data_M

A data frame of the incomplete data, containing NAs introduced by intermittent missingness and dropout.

data_O

A duplicate of data_E used internally for generating missingness probabilities.

Z

A matrix of random predictors (intercept and time slopes) used in generation.

pair

A matrix summarizing the missing data pattern (generated via mice::md.pattern).

Arguments

NNY

A logical value. If TRUE, the outcome Y is generated using non-normal distributions (Skew-t random effects, t-distribution residuals). If FALSE, it uses standard Normal distributions. Default: TRUE.

NNX

A logical value. If TRUE, the covariates X_7 through X_12 are generated using non-normal distributions (Mixture models, Skew-t random effects). If FALSE, they use standard Normal distributions. Default: TRUE.

n_subject

An integer specifying the number of subjects. Default: 1000.

seed

An optional integer for setting the random seed to ensure reproducibility. Default: NULL.

Details

The simulation process creates 12 covariates (X_1 to X_12):

  • X_1 to X_6: Base covariates generated via multivariate normal distributions with autoregressive sigma. X_4, X_5, X_6 are converted to binary.

  • X_7 to X_12: Derived covariates dependent on the base set, involving non-linear transformations (squares, logs, interactions).

Missingness is introduced in two stages:

  1. Intermittent Missingness: For variables X_7 to X_12, missingness indicators are drawn from Bernoulli distributions where the probability depends on the outcome Y and other covariates.

  2. Dropout: A "Loss to Follow-up" indicator is generated based on data at time point 3. If a subject drops out, all values for time points 4 and 5 become NA.

Examples

Run this code
# Simulate data with non-normal errors and random effects
sim_data <- simulation_imputation(NNY = TRUE, NNX = TRUE, n_subject = 10, seed = 123)

# View missing data pattern
sim_data$pair

Run the code above in your browser using DataLab