ipwtm: Estimate Inverse Probability Weights (Time Varying)

Description

Estimate inverse probability weights to fit marginal structural models, with a time-varying exposure and time-varying confounders. Within each unit under observation this function computes inverse probability weights at each time point during follow-up. The exposure can be binomial, multinomial, ordinal or continuous. Both stabilized and unstabilized weights can be estimated.

Usage

ipwtm(exposure, family, link, numerator = NULL, denominator, id,
       tstart, timevar, type, data, corstr = "ar1", trim = NULL,
       ...)

Arguments

exposure

vector, representing the exposure of interest. Both numerical and factor variables can be used. A binomial exposure variable should be coded using values 0/1

family

specifies a family of link functions, used to model the relationship between the variables in numerator or denominator and exposure, respectively. Alternatives are "binomial", "survival",

link

specifies the specific link function between the variables in numerator or denominator and exposure, respectively. For family="binomial" (fitted using glm) valid alternati

numerator

specifies the right-hand side of the model used to estimate the elements in the numerator of the inverse probability weights. When left unspecified, unstabilized weights, with a numerator of 1, are estimated.

denominator

specifies the right-hand side of the model used to estimate the elements in the denominator of the inverse probability weights.

vector, uniquely identifying the units under observation (typically patients) within which the longitudinal measurements are taken.

tstart

numerical vector, representing the starting time of follow-up intervals, using the counting process notation. This argument is only needed when family= "survival", otherwise it is ignored. The Cox proportional hazards models are

timevar

numerical vector, representing follow-up time, starting at 0. This variable is used as the end time of follow-up intervals, using the counting process notation, when family="survival".

type

specifies the type of exposure. Valid alternatives are "first" and "all". With type="first", weights are estimated up to the first switch from the lowest exposure value (typically 0 or the first factor l

data

dataframe containing exposure, variables in numerator and denominator, id, tstart and timevar.

corstr

correlation structure, only needed when using family="gaussian". Defaults to "ar1". See geeglm for details.

trim

optional trim percentile (0-0.5). When specified, both untrimmed and trimmed weights are returned.

...

further arguments passed to the functions used to estimate the numerator and denominator models (see family and link).

Value

A list containing the following elements:
ipw.weightsvector containing inverse probability weights for each observation. Returned in the same order as the observations in data, to facilitate merging.
weights.trimmedvector containing trimmed inverse probability weights, only returned when trim is specified.
callthe original function call.
selvarselection variable. With type="first", selvar=1 within each unit under observation, up to the first time point at which a switch from the lowest value of exposure to any other value is made, and selvar=0 after the first switch. For type="all", selvar=1 for all measurements. The numerator and denominator models are fitted only on observations with selvar=1. Returned in the same order as observations in data, to facilitate merging.
num.modthe numerator model, only returned when numerator is specified.
den.modthe denominator model.

Missing values

The exposure variable and the variables used in numerator and denominator, id, tstart and timevar should not contain missing values.

Details

Within each unit under observation i (usually patients), this function computes inverse probability weights at each time point j during follow-up. These weights are the cumulative product over all previous time points up to j of the ratio of two probabilities:

the numerator contains at each time point the probability of the observed exposure level given observed values of stabilization factors and the observed exposure history up to the time point before j. These probabilities are estimated using the model regressingexposureon the terms innumerator, using the link function indicated byfamilyandlink.
the denominator contains at each time point the probability of the observed exposure level given the observed history of time varying confounders up to j, as well as the stabilization factors in the numerator and the observed exposure history up to the time point before j. These probabilities are estimated using the model regressingexposureon the terms indenominator, using the link function indicated byfamilyandlink.

When the models from which the elements in the numerator and denominator are predicted are correctly specified, and there is no unmeasured confounding, weighting observations ij by the inverse probability weights adjusts for confounding of the effect of the exposure of interest. On the weighted dataset a marginal structural model can then be fitted, quantifying the causal effect of the exposure on the outcome of interest. With numerator specified, stabilized weights are computed, otherwise unstabilized weights with a numerator of 1 are computed. With a continuous exposure, using family="gaussian", weights are computed using the ratio of predicted densities at each time point. Therefore, for family="gaussian" only stabilized weights can be used, since unstabilized weights would have infinity variance.

References

Cole, S. R. & Hern�n, M. A. (2008). Constructing inverse probability weights for marginal structural models. American Journal of Epidemiology, 168(6), 656-664. Robins, J. M., Hern�n, M. A. & Brumback, B. A. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11, 550-560.

Examples

Run this code

########################################################################
#EXAMPLE 1

#Load longitudinal data from HIV positive individuals.
data(haartdat)

#CD4 is confounder for the effect of initiation of HAART therapy on mortality.
#Estimate inverse probability weights to correct for confounding.
#Exposure allocation model is Cox proportional hazards model.
temp <- ipwtm(
   exposure = haartind,
   family = "survival",
   numerator = ~ sex + age,
   denominator = ~ sex + age + cd4.sqrt,
   id = patient,
   tstart = tstart,
   timevar = fuptime,
   type = "first",
   data = haartdat)

#plot inverse probability weights
graphics.off()
ipwplot(weights = temp$ipw.weights, timevar = haartdat$fuptime,
   binwidth = 100, ylim = c(-1.5, 1.5), main = "Stabilized weights")

#MSM for the causal effect of initiation of HAART on mortality
#corrected for confounding by CD4 count using inverse probability weighting
#with robust standard error obtained using cluster().
summary(coxph(Surv(tstart, fuptime, event) ~ haartind + cluster(patient),
   data = haartdat, weights = temp$ipw.weights))   

#Uncorrected model
summary(coxph(Surv(tstart, fuptime, event) ~ haartind, data = haartdat))

########################################################################
#EXAMPLE 2

data(basdat)
data(timedat)

#Aim: to model the causal effect of active tuberculosis (TB) on mortality.
#Longitudinal CD4 is a confounder as well as intermediate for the effect of TB.

#process original measurements
   #check for ties (not allowed)
      table(duplicated(timedat[,c("id", "fuptime")]))
   #take square root of CD4 because of skewness
      timedat$cd4.sqrt <- sqrt(timedat$cd4count)
   #add TB time to dataframe
      timedat <- merge(timedat, basdat[,c("id", "Ttb")], by = "id", all.x = TRUE)
   #compute TB status
      timedat$tb.lag <- ifelse(with(timedat, !is.na(Ttb) & fuptime > Ttb), 1, 0)
   #longitudinal CD4-model
      cd4.lme <- lme(cd4.sqrt ~ fuptime + tb.lag, random = ~ fuptime | id,
      data = timedat)

#build new dataset:
#rows corresponding to TB-status switches, and individual end times
   times <- sort(unique(c(basdat$Ttb, basdat$Tend)))
   startstop <- data.frame(
      id = rep(basdat$id, each = length(times)),
      fuptime = rep(times, nrow(basdat)))
   #add baseline data to dataframe
      startstop <- merge(startstop, basdat, by = "id", all.x = TRUE)
   #limit individual follow-up using Tend
      startstop <- startstop[with(startstop, fuptime <= Tend),]
   startstop$tstart <- tstartfun(id, fuptime, startstop) #compute tstart (?tstartfun)
   #indicate TB status
      startstop$tb <- ifelse(with(startstop, !is.na(Ttb) & fuptime >= Ttb), 1, 0)
   #indicate TB status at previous time point
      startstop$tb.lag <- ifelse(with(startstop, !is.na(Ttb) & fuptime > Ttb), 1, 0)
   #indicate death
      startstop$event <- ifelse(with(startstop, !is.na(Tdeath) & fuptime >= Tdeath),
      1, 0)
   #impute CD4, based on TB status at previous time point.
      startstop$cd4.sqrt <- predict(cd4.lme, newdata = data.frame(id = startstop$id,
         fuptime = startstop$fuptime, tb.lag = startstop$tb.lag))

#compute inverse probability weights
   temp <- ipwtm(
      exposure = tb,
      family = "survival",
      numerator = ~ 1,
      denominator = ~ cd4.sqrt,
      id = id,
      tstart = tstart,
      timevar = fuptime,
      type = "first",
      data = startstop)
   summary(temp$ipw.weights)
   ipwplot(weights = temp$ipw.weights, timevar = startstop$fuptime, binwidth = 100)

#models
   #IPW-fitted MSM, using cluster() to obtain robust standard error estimate
      summary(coxph(Surv(tstart, fuptime, event) ~ tb + cluster(id),
      data = startstop, weights = temp$ipw.weights))
   #unadjusted
      summary(coxph(Surv(tstart, fuptime, event) ~ tb, data = startstop))
   #adjusted using conditioning: part of the effect of TB is adjusted away
      summary(coxph(Surv(tstart, fuptime, event) ~ tb + cd4.sqrt, data = startstop))

#compute bootstrap CI for TB parameter (takes a few hours)
#   #robust with regard to weights unequal to 1
#   boot.fun <- function(data, index, data.tm){
#      data.samp <- data[index,]
#      data.samp$id.samp <- 1:nrow(data.samp)
#      data.tm.samp <- do.call("rbind", lapply(data.samp$id.samp, function(id.samp)
#         cbind(data.tm[data.tm$id == data.samp$id[data.samp$id.samp == id.samp],],
#         id.samp = id.samp)))
#      cd4.lme <- lme(cd4.sqrt ~ fuptime + tb.lag, random = ~ fuptime | id.samp,
#         data = data.tm.samp)
#      times <- sort(unique(c(data.samp$Ttb, data.samp$Tend)))
#      startstop.samp <- data.frame(id.samp = rep(data.samp$id.samp,
#         each = length(times)), fuptime = rep(times, nrow(data.samp)))
#      startstop.samp <- merge(startstop.samp, data.samp, by = "id.samp",
#          all.x = TRUE)
#      startstop.samp <- startstop.samp[with(startstop.samp, fuptime <= Tend),]
#      startstop.samp$tstart <- tstartfun(id.samp, fuptime, startstop.samp)
#      startstop.samp$tb <- ifelse(with(startstop.samp, !is.na(Ttb) &
#          fuptime >= Ttb), 1, 0)
#      startstop.samp$tb.lag <- ifelse(with(startstop.samp, !is.na(Ttb) &
#          fuptime > Ttb), 1, 0)
#      startstop.samp$event <- ifelse(with(startstop.samp, !is.na(Tdeath) &
#          fuptime >= Tdeath), 1, 0)
#      startstop.samp$cd4.sqrt <- predict(cd4.lme, newdata = data.frame(id.samp =
#         startstop.samp$id.samp, fuptime = startstop.samp$fuptime, tb.lag =
#         startstop.samp$tb.lag))
#      return(coef(coxph(Surv(tstart, fuptime, event) ~ tb, data = startstop.samp,
#         weights = ipwtm(
#            exposure = tb,
#            family = "survival",
#            numerator = ~ 1,
#            denominator = ~ cd4.sqrt,
#            id = id.samp,
#            tstart = tstart,
#            timevar = fuptime,
#            type = "first",
#            data = startstop.samp)$ipw.weights))[1])
#      }
#   bootres <- boot(data = basdat, statistic = boot.fun, R = 999,
#      data.tm = timedat);bootres
#   boot.ci(bootres, type = "basic")

Run the code above in your browser using DataLab