Creates a data generating mechanism (DGM) for survival simulations based on the German Breast Cancer Study Group (GBSG) dataset. Supports heterogeneous treatment effects via treatment-subgroup interactions.
create_gbsg_dgm(
model = c("alt", "null"),
k_treat = 1,
k_inter = 1,
k_z3 = 1,
z1_quantile = 0.25,
n_super = DEFAULT_N_SUPER,
cens_type = c("weibull", "uniform"),
use_rand_params = FALSE,
seed = SEED_BASE,
verbose = FALSE
)A list of class "gbsg_dgm" containing:
Data frame with randomized super-population including potential outcomes (theta_0, theta_1, loghr_po)
Empirical hazard ratio in harm subgroup (Cox-based)
Empirical hazard ratio in complement subgroup (Cox-based)
Overall causal (ITT) hazard ratio (Cox-based)
Overall average hazard ratio (from loghr_po)
Average hazard ratio in harm subgroup
Average hazard ratio in complement subgroup
List matching generate_aft_dgm_flex output format
List with AFT model parameters (mu, sigma, gamma, etc.)
List with censoring model parameters
List with subgroup definitions and true factor names
Character vector of analysis variable names
Character indicating "alt" or "null"
Character. Either "alt" for alternative hypothesis with heterogeneous treatment effects, or "null" for uniform treatment effect. Default: "alt"
Numeric. Treatment effect multiplier applied to the treatment coefficient from the fitted AFT model. Values > 1 strengthen the treatment effect. Default: 1
Numeric. Interaction effect multiplier for the treatment-subgroup interaction (z1 * z3). Only used when model = "alt". Higher values create more heterogeneity between HR(H) and HR(Hc). Default: 1
Numeric. Effect multiplier for the z3 (menopausal status) coefficient. Default: 1
Numeric. Quantile threshold for z1 (estrogen receptor). Observations with ER <= quantile are coded as z1 = 1. Default: 0.25
Integer. Size of super-population for empirical HR estimation. Default: 5000
Character. Censoring distribution type: "weibull" or "uniform". Default: "weibull"
Logical. If TRUE, modifies confounder coefficients using estimates from randomized subset (meno == 0). Default: FALSE
Integer. Random seed for super-population generation. Default: 8316951
Logical. Print diagnostic information. Default: FALSE
This version is aligned with generate_aft_dgm_flex() and
calculate_hazard_ratios() methodology, computing individual-level
potential outcomes and average hazard ratios (AHR).
The harm subgroup H is defined as: z1 = 1 AND z3 = 1, where:
z1: Low estrogen receptor (ER <= 25th percentile by default)
z3: Premenopausal status (meno == 0)
The AFT model uses covariates: treat, z1, z2, z3, z4, z5, and (for "alt") the interaction zh = treat * z1 * z3.
The k_inter parameter modifies the zh coefficient in the AFT model:
gamma[zh] <- k_inter * gamma[zh]This affects the hazard ratio for the harm subgroup:
HR(H) = exp(-gamma[treat]/sigma - gamma[zh]/sigma)
HR(Hc) = exp(-gamma[treat]/sigma)
When k_inter = 0, HR(H) = HR(Hc) (no heterogeneity).
This function now computes:
theta_0: Log-hazard contribution under control
theta_1: Log-hazard contribution under treatment
loghr_po: Individual causal log hazard ratio (theta_1 - theta_0)
AHR metrics: exp(mean(loghr_po)) for overall and subgroups
simulate_from_gbsg_dgm for generating data from the DGM
calibrate_k_inter for finding k_inter to achieve target HR