check_censoring_dgm: Diagnose Censoring Consistency Between DGM Source Data and Simulated Data

Description

Compares the censoring distribution observed in the data used to build the DGM against the censoring generated by simulate_from_dgm. Reports censoring rates, time quantiles, KM-based median censoring times, and flags substantial discrepancies.

Usage

check_censoring_dgm(
  sim_data,
  dgm,
  treat_var = "treat_sim",
  rate_tol = 0.1,
  median_tol = 0.25,
  verbose = TRUE
)

Value

Invisibly returns a named list. Elements are: rates (data frame of censoring rates overall and by arm); quantiles (data frame of censoring-time quantiles among censored subjects); km_medians (data frame of KM-based median censoring times); and flags (character vector of triggered warnings, empty if none).

Arguments

sim_data: A data.frame returned by simulate_from_dgm.
dgm: An "aft_dgm_flex" object from generate_aft_dgm_flex. The super population (dgm$df_super) provides reference censoring times and event indicators on the DGM time scale.
treat_var: Character. Name of the treatment column in sim_data used for arm-stratified comparisons. Default "treat_sim".
rate_tol: Numeric. Absolute tolerance (proportion scale) for flagging a censoring-rate discrepancy. Default 0.10 (10 pp).
median_tol: Numeric. Relative tolerance for flagging a KM median censoring-time discrepancy. Default 0.25 (25 percent).
verbose: Logical. If TRUE, prints the full diagnostic table. Default TRUE.

Details

The reference censoring distribution is derived from dgm$df_super, sampled with replacement from the data passed to generate_aft_dgm_flex(). Columns y (observed time) and event (event indicator) in df_super reflect the original observed censoring process on the DGM time scale.

The KM median censoring time is estimated by reversing the event indicator (1 - event), treating events as censored and censored observations as the event of interest. This gives a non-parametric estimate of the censoring time distribution unconfounded by event occurrence.

Common causes of discrepancy: (1) time-scale mismatch (DGM built on days, analysis_time in months); check exp(dgm$model_params$mu) against your analysis_time. (2) Large cens_adjust shifting censoring substantially from the fitted model. (3) Short analysis_time or time_eos making administrative censoring dominate the censoring process.

Examples

Run this code

# \donttest{
dgm <- setup_gbsg_dgm(model = "null", verbose = FALSE)
sim_data <- simulate_from_dgm(dgm, n = 200)
check_censoring_dgm(sim_data, dgm = dgm)
# }