twoStageTMLE: twoStageTMLE

Description

Inverse probability of censoring weighted TMLE for evaluating parameters when the full set of covariates is available on only a subset of observations.

Usage

twoStageTMLE(
  Y,
  A,
  W,
  Delta.W,
  W.stage2,
  Z = NULL,
  Delta = rep(1, length(Y)),
  pi = NULL,
  piform = NULL,
  pi.SL.library = c("SL.glm", "SL.gam", "SL.glmnet", "tmle.SL.dbarts.k.5"),
  V.pi = 10,
  pi.discreteSL = TRUE,
  condSetNames = c("A", "W", "Y"),
  id = NULL,
  Q.family = "gaussian",
  augmentW = TRUE,
  augW.SL.library = c("SL.glm", "SL.glmnet", "tmle.SL.dbarts2"),
  rareOutcome = FALSE,
  verbose = FALSE,
  ...
)

Value

object of class 'twoStageTMLE'.

tmle: Treatment effect estimates and summary information
twoStage: IPCW weight estimation summary, pi are the probabilities, coef are SL weights or coefficients from glm fit, type of estimation procedure, discreteSL flag indicating whether discrete super learning was used
augW: Matrix of predicted outcomes based on stage 1 covariates only

Arguments

Y: outcome
A: binary treatment indicator
W: covariate matrix observed on everyone
Delta.W: binary indicator of missing second stage covariates
W.stage2: matrix of second stage covariates observed on subset of observations
Z: optional mediator of treatment effect for evaluating a controlled direct effect
Delta: binary indicator of missing value for outcome Y
pi: optional vector of missingness probabilities for W.stage2
piform: parametric regression formula for estimating pi (see Details)
pi.SL.library: super learner library for estimating pi (see Details)
V.pi: number of cross validation folds for estimating pi using super learner
pi.discreteSL: Use discrete super learning when TRUE, otherwise ensemble super learning
condSetNames: Variables to include as predictors of missingness in W.stage2, any combination of Y, A, and either W (for all covariates in W), or individual covariate names in W
id: Identifier of independent units of observation, e.g., clusters
Q.family: Regression family for the outcome
augmentW: When TRUE include predicted values for the outcome the set of covariates used to model the propensity score
augW.SL.library: super learner library for preliminary outcome regression model (ignored when augmentW is FALSE)
rareOutcome: When TRUE specifies less ambitious SL for Q in call to tmle (discreteSL, glm, glmnet, bart library, V=20)
verbose: When TRUE prints informational messages
...: other parameters passed to the tmle function (not checked)

Details

When using piform to specify a parametric model for pi that conditions on the outcome use Delta.W as the dependent variable and Y.orig on the right hand side of the formula instead of Y. When writing a user-defined SL wrapper for inclusion in pi.SL.library use Y on the left hand side of the formula. If specific covariate names are used on the right hand side use Y.orig to condition on the outcome.

Examples

Run this code

n <- 1000
W1 <- rnorm(n)
W2 <- rnorm(n)
W3 <- rnorm(n)
A <- rbinom(n, 1, plogis(-1 + .2*W1 + .3*W2 + .1*W3))
Y <- 10 + A + W1 + W2 + A*W1 + W3 + rnorm(n)
d <- data.frame(Y, A, W1, W2, W3)
# Set 400 with data on W3, more likely if W1 > 1
n.sample <- 400
p.sample <- 0.5 + .2*(W1 > 1)
rows.sample <- sample(1:n, size = n.sample, p = p.sample)
Delta.W <- rep(0,n)
Delta.W[rows.sample] <- 1
W3.stage2 <- cbind(W3 = W3[Delta.W==1])
#1. specify parametric models and do not augment W (fast, but not recommended)
result1 <- twoStageTMLE(Y=Y, A=A, W=cbind(W1, W2), Delta.W = Delta.W, 
   W.stage2 = W3.stage2, piform = "Delta.W~ I(W1 > 0) + Y.orig", V.pi= 5,
   verbose = TRUE, Qform = "Y~A+W1",gform="A~W1 + W2 +W3", augmentW = FALSE)
summary(result1)
# \donttest{
#2. specify a parametric model for conditional missingness probabilities (pi)
#   and use default values to estimate marginal effect using \code{tmle}
result2 <- twoStageTMLE(Y=Y, A=A, W=cbind(W1, W2), Delta.W = Delta.W, 
     W.stage2 = cbind(W3)[Delta.W == 1], piform = "Delta.W~ I(W1 > 0)", 
     V.pi= 5,verbose = TRUE)
result2
# }