Core estimation function for computing weighted Kaplan-Meier survival curves with propensity score weights. Handles multiple treatment groups simultaneously and uses the classic weighted Greenwood formula for variance estimation. Estimate Weighted Kaplan-Meier Curves for All Treatment Groups
Computes weighted Kaplan-Meier survival estimates and variances for all treatment groups using propensity score weights. Uses classic weighted Greenwood formula: \(Var[S(t)] = [S(t)]^2 \sum (D_l / (R_l (R_l - D_l)))\).
estimate_weighted_km(
data,
time_var,
event_var,
treatment_var,
weights,
treatment_levels
)A list containing:
Numeric vector of all unique event times where survival is estimated.
Matrix [n_times x n_groups] of survival estimates. Column names are treatment levels.
Matrix [n_times x n_groups] of variances for survival.
Matrix [n_times x n_groups] of weighted number at risk (R).
Matrix [n_times x n_groups] of weighted number of events (D).
Matrix [n_times x n_groups] of cumulative weighted events up to each time.
Treatment levels (column names for matrices).
Number of treatment groups.
A data.frame containing the complete-case analysis data.
A character string specifying the name of the time variable.
A character string specifying the name of the event variable. Should be coded as 1 = event, 0 = censored.
A character string specifying the name of the treatment
variable in data.
A numeric vector of propensity score weights with length equal to nrow(data). Each observation has one weight corresponding to its observed treatment group. For ATE: \(w_i = 1/e_j(X_i)\) where j is observed treatment.
A vector of unique treatment values (sorted). Should
match the levels from estimate_ps().
**Weighted Kaplan-Meier Formula:**
For treatment group j, at each event time \(t_l\): $$R_l = \sum_{i: T_i \ge t_l, Z_i = j} w_{i,j}$$ $$D_l = \sum_{i: T_i = t_l, \delta_i = 1, Z_i = j} w_{i,j}$$ $$\hat{S}^w_j(t) = \prod_{t_l \le t} \left(1 - \frac{D_l}{R_l}\right)$$
where \(R_l\) is the weighted number at risk and \(D_l\) is the weighted number of events. Ties between events and censorings are handled using the Breslow method.
**Classic Weighted Greenwood Variance:**
$$Var[\hat{S}^w_j(t)] = [\hat{S}^w_j(t)]^2 \sum_{t_l \le t} \frac{D_l}{R_l (R_l - D_l)}$$
This is the standard weighted extension of Greenwood's formula. When all weights equal 1, reduces to classical Greenwood's formula.
**Weight Structure:**
The weight vector has length nrow(data). Each observation i in treatment group j has weight \(w_i\) based on its propensity score for group j. For ATE estimation, \(w_i = 1/e_j(X_i)\). When computing weighted KM for group j, only observations with \(Z_i = j\) and their corresponding weights are used.
**Handling Edge Cases:**
- If weighted at-risk count \(R_l = 0\) at time t, survival remains constant after t (last observation censored). - Variance is undefined when \(R_l - D_l \le 0\); set to NA for that time point.