Treatment strategies
Users can estimate effects of treatment strategies with the following components:
Initiate treatment \(z\) at baseline
Follow a user-specified time-varying adherence protocol for treatment \(z\)
Ensure an outcome measurement at the follow-up time of interest.
The time-varying adherence protocol is specified by indicating in data when an individual deviates from their adherence protocol. The function prep_data facilitates this step. See also "Formatting data".
Formatting data
The input data set data must be a data table (or data frame) in a "long" format, where each row represents one time interval for one individual. The data frame should contain the following columns:
id: A unique identifier for each participant.
time: The follow-up time index, starting from 0 and increasing in increments of 1 in consecutive rows.
Covariate columns: One or more columns for baseline and time-varying covariates.
Z: The treatment initiated at baseline.
A: An indicator for adherence to the treatment protocol at each time point.
R: An indicator of whether the outcome was measured at that time point (1 for measured, 0 for not measured/censored).
Y: The outcome variable, which can be binary or continuous.
To specify the intervention, the data set should additionally have the following columns:
C_artificial: An indicator specifying when an individual should be artificially censored from the data due to violating the adherence protocol.
A_model_eligible: An indicator specifying which records should be used for fitting the treatment adherence model.
The prep_data function facilitates adding these columns to the data set. Users may optionally include the following column for fitting the outcome measurement model:
Otherwise, the R_model_denominator_eligible is fit on all records on the artificially censored data set.
Specifying the models
Users must specify model statements for the treatment (A_model), outcome measurement (R_model_numerator and R_model_denominator), and outcome variable (Y_model). The package uses pooled-over-time generalized linear models that are fit over the relevant time points (see "Formatting data"), where logistic regression is used for binary variables and linear regression is used for continuous variables.
For stabilized weights, the outcome measurement model R_model_numerator should only include baseline covariates, treatment initiated Z, and time as predictors. It must not include time-varying covariates as predictors. The outcome model Y_model should also only depend on baseline covariates, treatment initiated Z, and time (if using time smoothing).
A note on the outcome definition at baseline
In some settings, the outcome may not be defined in the baseline time interval. The ipw function can accommodate such settings in two ways:
Users can set a value of NA in the column Y in the input data set data in rows corresponding to time 0. In this case, users should ensure that include_baseline_outcome is set to FALSE.
Users can specify the value of \(Y_{t+1}\) (rather than \(Y_t\)) in the column Y in the input data set data in rows corresponding to time \(t\). That is, the value supplied for Y in the input data set data at time 0 is \(Y_1\). In this case, users should ensure that include_baseline_outcome is set to TRUE. Users should also set outcome_times accordingly.
Note that these two approaches involve different assumptions. For example, the first approach allows the outcome at time \(t\) to depend on time-varying covariates up to and including time \(t\), whereas the second approach only allows the outcome at time \(t\) to depend on covariates up to and including time \(t-1\).