Compares the censoring distribution observed in the data used to build the
DGM against the censoring generated by simulate_from_dgm.
Reports censoring rates, time quantiles, KM-based median censoring times,
and flags substantial discrepancies.
check_censoring_dgm(
sim_data,
dgm,
treat_var = "treat_sim",
rate_tol = 0.1,
median_tol = 0.25,
verbose = TRUE
)Invisibly returns a named list. Elements are: rates (data
frame of censoring rates overall and by arm); quantiles (data
frame of censoring-time quantiles among censored subjects);
km_medians (data frame of KM-based median censoring times); and
flags (character vector of triggered warnings, empty if none).
A data.frame returned by
simulate_from_dgm.
An "aft_dgm_flex" object from
generate_aft_dgm_flex. The super population
(dgm$df_super) provides reference censoring times and event
indicators on the DGM time scale.
Character. Name of the treatment column in
sim_data used for arm-stratified comparisons.
Default "treat_sim".
Numeric. Absolute tolerance (proportion scale) for
flagging a censoring-rate discrepancy. Default 0.10 (10 pp).
Numeric. Relative tolerance for flagging a KM median
censoring-time discrepancy. Default 0.25 (25 percent).
Logical. If TRUE, prints the full diagnostic table.
Default TRUE.
The reference censoring distribution is derived from dgm$df_super,
sampled with replacement from the data passed to
generate_aft_dgm_flex(). Columns y (observed time) and
event (event indicator) in df_super reflect the original
observed censoring process on the DGM time scale.
The KM median censoring time is estimated by reversing the event indicator
(1 - event), treating events as censored and censored observations
as the event of interest. This gives a non-parametric estimate of the
censoring time distribution unconfounded by event occurrence.
Common causes of discrepancy: (1) time-scale mismatch (DGM built on days,
analysis_time in months); check exp(dgm$model_params$mu)
against your analysis_time. (2) Large cens_adjust shifting
censoring substantially from the fitted model. (3) Short
analysis_time or time_eos making administrative censoring
dominate the censoring process.
simulate_from_dgm, generate_aft_dgm_flex
# \donttest{
dgm <- setup_gbsg_dgm(model = "null", verbose = FALSE)
sim_data <- simulate_from_dgm(dgm, n = 200)
check_censoring_dgm(sim_data, dgm = dgm)
# }
Run the code above in your browser using DataLab