ipw: Time-smoothed inverse probability weighting

Description

This function applies the time-smoothed inverse probability weighted (IPW) approach described by McGrath et al. (2025) to estimate effects of generalized time-varying treatment strategies on the mean of an outcome at one or more selected follow-up times of interest. Binary and continuous outcomes are supported.

Usage

ipw(
  data,
  time_smoothed = TRUE,
  smoothing_method = "nonstacked",
  outcome_times,
  A_model,
  R_model_numerator = NULL,
  R_model_denominator,
  Y_model,
  truncation_percentile = NULL,
  include_baseline_outcome,
  return_model_fits = TRUE,
  return_weights = TRUE,
  trim_returned_models = FALSE
)

Value

An object of class "ipw". This object is a list that includes the following components:

est: A data frame containing the counterfactual mean/probability estimates for each medication at each time interval.
model_fits: A list containing the fitted models for the treatment, outcome measurement, and outcome (if return_model_fits is set to TRUE). If the nonstacked time-smoothed approach is used, the \(i\)th element in model_fits is a list of fitted models for the \(i\)th outcome time in outcome_times. If the stacked time-smoothed approach is used, the \(i\)th element in model_fits is a list of fitted models for the outcome time \(i+1\) in the data set data. The last element in model_fits contains the fitted outcome model.
data_weights: (A list containing) the artificially censored data set with columns for the estimated weights. The column "weights" contains the (final) inverse probability weight, and the columns "weights_A" and "weights_R" contain the inverse probability weights for treatment and outcome measurement, respectively. If no deaths are present in the data, this object will be a data frame. If deaths are present in the data and either the non-smoothed IPW method is applied or the time-smoothed non-stacked IPW method is applied, this object will be a list of length length(outcome_times) where each element corresponds to the artificially censored data set for each outcome time in outcome_times. If deaths are present in the data and the time-smoothed stacked IPW method is applied, this object will be a data frame with the stacked, artificially censored data.
args: A list containing the arguments supplied to ipw, except the observed data set.

Arguments

data: Data table (or data frame) containing the observed data. See "Details".
time_smoothed: Logical scalar specifying whether the time-smoothed or non-smoothed IPW method is applied. The default is TRUE, i.e., the time-smoothed IPW method.
smoothing_method: Character string specifying the time-smoothed IPW method when there are deaths present. The options include "nonstacked" and "stacked". The default is "nonstacked".
outcome_times: Numeric vector specifying the follow-up time(s) of interest for the counterfactual outcome mean/probability
A_model: Model statement for the treatment variable
R_model_numerator: (Optional) Model statement for the indicator variable for the measurement of the outcome variable, used in the numerator of the IP weights. The default is NULL, i.e., a numerator of 1 is used in the IP weights.
R_model_denominator: Model statement for the indicator variable for the measurement of the outcome variable, used in the denominator of the IP weights
Y_model: Model statement for the outcome variable
truncation_percentile: Numerical scalar specifying the percentile by which to truncate the IP weights. The default is NULL, i.e., no truncation.
include_baseline_outcome: Logical scalar indicating whether to include the time interval indexed by 0 in fitting the time-smoothed outcome model and outcome measurement models. By default, this argument is set to TRUE if data has any non-missing outcome values in the time interval indexed by 0 and is otherwise set to FALSE.
return_model_fits: Logical scalar specifying whether to include the fitted models in the output. The default is TRUE.
return_weights: Logical scalar specifying whether to return the estimated inverse probability weights. The default is TRUE.
trim_returned_models: Logical scalar specifying whether to only return the estimated coefficients (and corresponding standard errors, z scores, and p-values) of the fitted models (e.g., treatment model) rather than the full fitted model objects. This reduces the size of the object returned by the ipw function when return_model_fits is set to TRUE, especially when the observed data set is large. By default, this argument is set to FALSE.

Details

Treatment strategies

Users can estimate effects of treatment strategies with the following components:

Initiate treatment \(z\) at baseline
Follow a user-specified time-varying adherence protocol for treatment \(z\)
Ensure an outcome measurement at the follow-up time of interest.

The time-varying adherence protocol is specified by indicating in data when an individual deviates from their adherence protocol. The function prep_data facilitates this step. See also "Formatting data".

Formatting data

The input data set data must be a data table (or data frame) in a "long" format, where each row represents one time interval for one individual. The data frame should contain the following columns:

id: A unique identifier for each participant.
time: The follow-up time index, starting from 0 and increasing in increments of 1 in consecutive rows.
Covariate columns: One or more columns for baseline and time-varying covariates.
Z: The treatment initiated at baseline.
A: An indicator for adherence to the treatment protocol at each time point.
R: An indicator of whether the outcome was measured at that time point (1 for measured, 0 for not measured/censored).
Y: The outcome variable, which can be binary or continuous.

To specify the intervention, the data set should additionally have the following columns:

C_artificial: An indicator specifying when an individual should be artificially censored from the data due to violating the adherence protocol.
A_model_eligible: An indicator specifying which records should be used for fitting the treatment adherence model.

The prep_data function facilitates adding these columns to the data set. Users may optionally include the following column for fitting the outcome measurement model:

R_model_denominator_eligible: An indicator specifying which records should be used for fitting the outcome measurement model R_model_denominator_eligible.

Otherwise, the R_model_denominator_eligible is fit on all records on the artificially censored data set.

Specifying the models

Users must specify model statements for the treatment (A_model), outcome measurement (R_model_numerator and R_model_denominator), and outcome variable (Y_model). The package uses pooled-over-time generalized linear models that are fit over the relevant time points (see "Formatting data"), where logistic regression is used for binary variables and linear regression is used for continuous variables.

For stabilized weights, the outcome measurement model R_model_numerator should only include baseline covariates, treatment initiated Z, and time as predictors. It must not include time-varying covariates as predictors. The outcome model Y_model should also only depend on baseline covariates, treatment initiated Z, and time (if using time smoothing).

A note on the outcome definition at baseline

In some settings, the outcome may not be defined in the baseline time interval. The ipw function can accommodate such settings in two ways:

Users can set a value of NA in the column Y in the input data set data in rows corresponding to time 0. In this case, users should ensure that include_baseline_outcome is set to FALSE.
Users can specify the value of \(Y_{t+1}\) (rather than \(Y_t\)) in the column Y in the input data set data in rows corresponding to time \(t\). That is, the value supplied for Y in the input data set data at time 0 is \(Y_1\). In this case, users should ensure that include_baseline_outcome is set to TRUE. Users should also set outcome_times accordingly.

Note that these two approaches involve different assumptions. For example, the first approach allows the outcome at time \(t\) to depend on time-varying covariates up to and including time \(t\), whereas the second approach only allows the outcome at time \(t\) to depend on covariates up to and including time \(t-1\).

References

McGrath S, Kawahara T, Petimar J, Rifas-Shiman SL, Díaz I, Block JP, Young JG. (2025). Time-smoothed inverse probability weighted estimation of effects of generalized time-varying treatment strategies on repeated outcomes truncated by death. arXiv e-prints arXiv:2509.13971.

Examples

Run this code


## Time-smoothed IPW without deaths (continuous outcome)
data_null_processed <- prep_data(data = data_null, grace_period_length = 2,
                                 baseline_vars = 'L')
res <- ipw(data = data_null_processed,
           time_smoothed = TRUE,
           outcome_times = c(6, 12, 18, 24),
           A_model = A ~ L + Z,
           R_model_numerator = R ~ L_baseline + Z,
           R_model_denominator = R ~ L + A + Z,
           Y_model = Y ~ L_baseline * (time + Z))
res

## Time-smoothed IPW with deaths, nonstacked smoothing method (continuous outcome)
data_null_deaths_processed <- prep_data(data = data_null_deaths, grace_period_length = 2,
                                        baseline_vars = 'L')
res <- ipw(data = data_null_deaths_processed,
           time_smoothed = TRUE,
           smoothing_method = 'nonstacked',
           outcome_times = c(6, 12, 18, 24),
           A_model = A ~ L + Z,
           R_model_numerator = R ~ L_baseline + Z,
           R_model_denominator = R ~ L + A + Z,
           Y_model = Y ~ L_baseline * (time + Z))
res

## Time-smoothed IPW with deaths, stacked smoothing method (binary outcome)
# \donttest{
data_null_deaths_binary_processed <- prep_data(data = data_null_deaths_binary,
                                               grace_period_length = 2,
                                               baseline_vars = 'L')
res <- ipw(data = data_null_deaths_binary_processed,
           time_smoothed = TRUE,
           smoothing_method = 'stacked',
           outcome_times = c(6, 12, 18, 24),
           A_model = A ~ L + Z,
           R_model_numerator = R ~ L_baseline + Z,
           R_model_denominator = R ~ L + A + Z,
           Y_model = Y ~ L_baseline * (time + Z))
res$est
# }

Run the code above in your browser using DataLab