estimate_weighted_km: Weighted Kaplan-Meier Estimation with Classic Greenwood Variance

Description

Core estimation function for computing weighted Kaplan-Meier survival curves with propensity score weights. Handles multiple treatment groups simultaneously and uses the classic weighted Greenwood formula for variance estimation. Estimate Weighted Kaplan-Meier Curves for All Treatment Groups

Computes weighted Kaplan-Meier survival estimates and variances for all treatment groups using propensity score weights. Uses classic weighted Greenwood formula: $Var[S(t)] = [S(t)]^2 \sum (D_l / (R_l (R_l - D_l)))$.

Usage

estimate_weighted_km(
  data,
  time_var,
  event_var,
  treatment_var,
  weights,
  treatment_levels
)

Value

A list containing:

eval_times: Numeric vector of all unique event times where survival is estimated.
surv_estimates: Matrix [n_times x n_groups] of survival estimates. Column names are treatment levels.
surv_var: Matrix [n_times x n_groups] of variances for survival.
n_risk: Matrix [n_times x n_groups] of weighted number at risk (R).
n_event: Matrix [n_times x n_groups] of weighted number of events (D).
n_acc_event: Matrix [n_times x n_groups] of cumulative weighted events up to each time.
treatment_levels: Treatment levels (column names for matrices).
n_levels: Number of treatment groups.

Arguments

data: A data.frame containing the complete-case analysis data.
time_var: A character string specifying the name of the time variable.
event_var: A character string specifying the name of the event variable. Should be coded as 1 = event, 0 = censored.
treatment_var: A character string specifying the name of the treatment variable in data.
weights: A numeric vector of propensity score weights with length equal to nrow(data). Each observation has one weight corresponding to its observed treatment group. For ATE: $w_i = 1/e_j(X_i)$ where j is observed treatment.
treatment_levels: A vector of unique treatment values (sorted). Should match the levels from estimate_ps().

Details

**Weighted Kaplan-Meier Formula:**

For treatment group j, at each event time $t_l$: $$R_l = \sum_{i: T_i \ge t_l, Z_i = j} w_{i,j}$$ $$D_l = \sum_{i: T_i = t_l, \delta_i = 1, Z_i = j} w_{i,j}$$ $$\hat{S}^w_j(t) = \prod_{t_l \le t} \left(1 - \frac{D_l}{R_l}\right)$$

where $R_l$ is the weighted number at risk and $D_l$ is the weighted number of events. Ties between events and censorings are handled using the Breslow method.

**Classic Weighted Greenwood Variance:**

$$Var[\hat{S}^w_j(t)] = [\hat{S}^w_j(t)]^2 \sum_{t_l \le t} \frac{D_l}{R_l (R_l - D_l)}$$

This is the standard weighted extension of Greenwood's formula. When all weights equal 1, reduces to classical Greenwood's formula.

**Weight Structure:**

The weight vector has length nrow(data). Each observation i in treatment group j has weight $w_i$ based on its propensity score for group j. For ATE estimation, $w_i = 1/e_j(X_i)$. When computing weighted KM for group j, only observations with $Z_i = j$ and their corresponding weights are used.

**Handling Edge Cases:**

- If weighted at-risk count $R_l = 0$ at time t, survival remains constant after t (last observation censored). - Variance is undefined when $R_l - D_l \le 0$; set to NA for that time point.