smoothedIPW
Table of Contents
Description
The smoothedIPW package implements methods to estimate effects of
generalized time-varying treatment strategies on the mean of an outcome
at one or more selected follow-up times of interest. The package allows
for treatment strategies with the following components:
- Initiate treatment $z$ at baseline
- Follow a user-specified time-varying adherence protocol for treatment $z$
- Ensure an outcome measurement at the follow-up time of interest.
The package considers the setting where outcomes may be repeatedly, non-monotonically, informatively, and sparsely measured in the data source. The package also supports settings where outcomes are truncated by death, i.e. some individuals die during follow-up which renders the outcome of interest undefined at the follow-up time of interest.
Specifically, this package implements the time-smoothed inverse probability weighted (IPW) methods described in McGrath et al. (2025). Time-smoothing refers to using outcome measurements at intermediate time-points in order to gain precision. In settings with truncation by death, two different types of approaches for time-smoothing are available (i.e., the stacked and nonstacked methods), which rely on different model assumptions. Further details are given in McGrath et al. (2025).
Installation
You can install the development version of smoothedIPW from
GitHub with:
# install.packages("devtools")
devtools::install_github("stmcg/smoothedIPW")Example 1: No Deaths
We first load the package.
library(smoothedIPW)We will estimate the effect of treatment strategies with the following three components:
- Initiate medication $z$ ($z \in {0, 1}$) at baseline
- Adhere to medication $z$ throughout the follow-up, allowing for a grace period of 2 months
- Ensure an outcome measurement at the follow-up time of interest
We consider the follow-up time of interest to be $t^$, $t^ \in {6, 12, 18, 24}$. In this example, we consider that there are no deaths over the study period. The second example considers a setting with deaths over the study period.
Data Set
We will use the example data set data_null which contains longitudinal
data on 1,000 individuals over 25 time points. This data set was
generated so that the treatment has no effect on the outcome at all time
points. The data set data_null contains the following columns:
id: Participant IDtime: Follow-up time indexL: Time-varying covariateZ: Medication initiated at baselineA: Adherence to the medication initiated at baselineR: Indicator of outcome measurementY: Outcome
The first 10 rows of data_null are:
data_null[1:10,]
#> id time L Z A R Y
#> <num> <int> <int> <int> <num> <int> <num>
#> 1: 1 0 1 0 1 0 NA
#> 2: 1 1 0 0 1 0 NA
#> 3: 1 2 1 0 1 0 NA
#> 4: 1 3 0 0 1 0 NA
#> 5: 1 4 0 0 1 1 -4.367446
#> 6: 1 5 1 0 1 0 NA
#> 7: 1 6 1 0 1 0 NA
#> 8: 1 7 1 0 1 0 NA
#> 9: 1 8 0 0 0 0 NA
#> 10: 1 9 0 0 1 0 NAThe package generally expects users to follow these naming conventions for the columns of the observed data set. The columns for the time-varying covariate(s) are an exception to this, which can take on any names.
Applying IPW
Preparing the data set
We first need to add some variables to the data set before applying inverse probability weighting. Specifically, we need to add:
C_artificial: An indicator specifying when an individual should be artificially censored from the dataA_model_eligible: An indicator specifying what records should be used for fitting the treatment adherence model
We will also need to add columns for the baseline value of the
time-varying covariates. In our case, we will add a column L_baseline
for the baseline value of L.
These columns can be added by the prep_data function, as shown below:
data_null_processed <- prep_data(data = data_null, grace_period_length = 2,
baseline_vars = 'L')To see this, let us inspect the processed dataset for individual
id = 2 in the first 10 time intervals:
data_null_processed[id == 2 & time < 10]
#> id time L Z A R Y A_model_eligible
#> <num> <int> <int> <int> <num> <int> <num> <num>
#> 1: 2 0 1 0 1 1 -6.896476 0
#> 2: 2 1 1 0 1 0 NA 0
#> 3: 2 2 1 0 1 1 -10.390042 0
#> 4: 2 3 1 0 1 0 NA 0
#> 5: 2 4 1 0 1 0 NA 0
#> 6: 2 5 1 0 1 0 NA 0
#> 7: 2 6 1 0 0 0 NA 0
#> 8: 2 7 0 0 0 1 -4.286635 0
#> 9: 2 8 1 0 0 1 11.385070 1
#> 10: 2 9 0 0 1 0 NA 0
#> C_artificial L_baseline
#> <num> <int>
#> 1: 0 1
#> 2: 0 1
#> 3: 0 1
#> 4: 0 1
#> 5: 0 1
#> 6: 0 1
#> 7: 0 1
#> 8: 0 1
#> 9: 1 1
#> 10: 1 1Observe that A_model_eligible becomes 1 when time = 8 because the
individual is at the end of their grace period (i.e., has already went
two consecutive intervals without adhering to the mediation);
C_artificial switches to 1 in this time interval because the
individual did not adhere to the mediation in this interval (the end of
their grace period), thus violating the treatment strategy of interest.
Point estimation
We will use the time-smoothed IPW method, which is implemented in the
ipw function. This method involves specifying the following models:
A_model: Treatment adherence modelR_model_denominator: Outcome measurement indicator model (used in the denominator of weights)R_model_numerator: (Optional) Outcome measurement indicator model (used in the numerator of weights for stabilization)Y_model: Outcome (marginal structural) model
An example application of ipw is below:
res_est <- ipw(data = data_null_processed,
time_smoothed = TRUE,
outcome_times = c(6, 12, 18, 24),
A_model = A ~ L + Z,
R_model_denominator = R ~ L + A + Z,
R_model_numerator = R ~ L_baseline + Z,
Y_model = Y ~ L_baseline * (time + Z))The estimated counterfactual outcome mean for each medication at each follow-up time of interest ($t^*$) is given below.
res_est
#>
#> =======================================================================
#> Point Estimates: Counterfactual Outcome Mean
#> =======================================================================
#>
#> Settings:
#> -----------------------------------------------------------------------
#> Method: Time-smoothed IPW
#> Outcome times: 6, 12, 18, 24
#>
#> Estimates:
#> -----------------------------------------------------------------------
#> time Z=0 Z=1
#> 6 0.007124865 -0.03536354
#> 12 -0.019445369 -0.06193377
#> 18 -0.046015603 -0.08850401
#> 24 -0.072585838 -0.11507424Interval estimation
To obtain 95% confidence intervals around our estimates, we can apply
the get_CI function. It constructs percentile-based bootstrap
confidence intervals using n_boot bootstrap replicates. We use 10
bootstrap replicates for ease of computation.
set.seed(1234)
res_ci <- get_CI(res_est, data = data_null_processed, n_boot = 10)
res_ci
#>
#> =======================================================================
#> Confidence Intervals: Counterfactual Outcome Mean
#> =======================================================================
#>
#> Settings:
#> -----------------------------------------------------------------------
#> Method: Time-smoothed IPW
#> Outcome times: 6, 12, 18, 24
#> Bootstrap samples: 10
#> Confidence level: 95%
#>
#> Confidence Intervals:
#> -----------------------------------------------------------------------
#>
#> Outcome Mean under Z = 0:
#> Time Estimate CI Lower CI Upper
#> 6 0.007124865 -0.1774043 0.13580696
#> 12 -0.019445369 -0.2534594 0.08688074
#> 18 -0.046015603 -0.3295146 0.04689198
#> 24 -0.072585838 -0.4055698 0.05887030
#>
#> Outcome Mean under Z = 1:
#> Time Estimate CI Lower CI Upper
#> 6 -0.03536354 -0.2027095 0.05405786
#> 12 -0.06193377 -0.2137749 0.02011806
#> 18 -0.08850401 -0.2910591 0.03815230
#> 24 -0.11507424 -0.4132964 0.06315579Example 2: With Deaths
We next consider an example where some participants die during follow-up. We consider the same treatment strategies as in the first example.
Data Set
We use the example data set data_null_deaths, which is similar to
data_null but includes deaths during follow-up. This results in fewer
total observations (21,713 vs 25,000) because individuals who die have
no records after their death time. The data set contains an additional
column:
D: Indicator of whether death occurred at that time point
The rows of data_null_deaths for one individual who died at time 5 are
shown below:
data_null_deaths[id == 151,]
#> id time L Z A R Y D
#> <num> <int> <int> <int> <num> <int> <num> <num>
#> 1: 151 0 0 0 1 0 NA 0
#> 2: 151 1 0 0 1 0 NA 0
#> 3: 151 2 0 0 1 1 -2.826145 0
#> 4: 151 3 1 0 0 0 NA 0
#> 5: 151 4 1 0 1 1 -8.082282 0
#> 6: 151 5 1 0 1 NA NA 1Applying IPW
Preparing the data set
We prepare the data set in the same way as before using prep_data:
data_null_deaths_processed <- prep_data(data = data_null_deaths, grace_period_length = 2, baseline_vars = 'L')Point estimation
When deaths are present, we can choose between two different
time-smoothing methods: the nonstacked method and stacked method. Users
can specify the smoothing method by the smoothing_method argument
(options: 'nonstacked' and 'stacked') in the ipw function.
res_est_deaths <- ipw(data = data_null_deaths_processed,
time_smoothed = TRUE,
smoothing_method = 'nonstacked',
outcome_times = c(6, 12, 18, 24),
A_model = A ~ L + Z,
R_model_denominator = R ~ L + A + Z,
R_model_numerator = R ~ L_baseline + Z,
Y_model = Y ~ L_baseline * (time + Z))The estimated counterfactual outcome mean for each medication at each follow-up time of interest is given below.
res_est_deaths
#>
#> =======================================================================
#> Point Estimates: Counterfactual Outcome Mean
#> =======================================================================
#>
#> Settings:
#> -----------------------------------------------------------------------
#> Method: Time-smoothed IPW (nonstacked)
#> Outcome times: 6, 12, 18, 24
#>
#> Estimates:
#> -----------------------------------------------------------------------
#> time Z=0 Z=1
#> 6 0.03219486 -0.2047087
#> 12 -0.33221811 -0.3572665
#> 18 -0.38437663 -0.3460489
#> 24 -0.44631635 -0.3221055Interval estimation
Confidence intervals can be obtained using bootstrap in the same way as in the case without deaths:
set.seed(1234)
res_ci_deaths <- get_CI(res_est_deaths, data = data_null_deaths_processed, n_boot = 10)
res_ci_deaths
#>
#> =======================================================================
#> Confidence Intervals: Counterfactual Outcome Mean
#> =======================================================================
#>
#> Settings:
#> -----------------------------------------------------------------------
#> Method: Time-smoothed IPW (nonstacked)
#> Outcome times: 6, 12, 18, 24
#> Bootstrap samples: 10
#> Confidence level: 95%
#>
#> Confidence Intervals:
#> -----------------------------------------------------------------------
#>
#> Outcome Mean under Z = 0:
#> Time Estimate CI Lower CI Upper
#> 6 0.03219486 -0.5188776 0.37035189
#> 12 -0.33221811 -0.5737509 -0.02752269
#> 18 -0.38437663 -0.6908641 -0.02397294
#> 24 -0.44631635 -0.6551153 -0.22480044
#>
#> Outcome Mean under Z = 1:
#> Time Estimate CI Lower CI Upper
#> 6 -0.2047087 -0.7996861 0.18012098
#> 12 -0.3572665 -0.7339370 -0.13441412
#> 18 -0.3460489 -0.6605846 -0.05504764
#> 24 -0.3221055 -0.7281114 -0.07134614Citation
If you use smoothedIPW in your research, please cite:
McGrath S, Kawahara T, Petimar J, Rifas-Shiman SL, Díaz I, Block JP, Young JG. (2025). Time-smoothed inverse probability weighted estimation of effects of generalized time-varying treatment strategies on repeated outcomes truncated by death. arXiv preprint arXiv:2509.13971.
BibTeX entry:
@article{mcgrath2025time,
title={Time-smoothed inverse probability weighted estimation of effects of generalized time-varying treatment strategies on repeated outcomes truncated by death},
author={McGrath, Sean and Kawahara, Takuya and Petimar, Joshua and Rifas-Shiman, Sheryl L and D{\'\i}az, Iv{\'a}n and Block, Jason P and Young, Jessica G},
journal={arXiv preprint arXiv:2509.13971},
year={2025}
}