drtmle: TMLE estimate of the average treatment effect with doubly-robust inference

Description

TMLE estimate of the average treatment effect with doubly-robust inference

Usage

drtmle(Y, A, W, DeltaA = as.numeric(!is.na(A)),
  DeltaY = as.numeric(!is.na(Y)), a_0 = unique(A[!is.na(A)]), family = if
  (all(Y %in% c(0, 1))) {     stats::binomial() } else {    
  stats::gaussian() }, stratify = FALSE, SL_Q = NULL, SL_g = NULL,
  SL_Qr = NULL, SL_gr = NULL, n_SL = 1, avg_over = "drtmle",
  se_cv = "none", se_cvFolds = ifelse(se_cv == "partial", 10, 1),
  targeted_se = se_cv != "partial", glm_Q = NULL, glm_g = NULL,
  glm_Qr = NULL, glm_gr = NULL, adapt_g = FALSE, guard = c("Q", "g"),
  reduction = "univariate", returnModels = FALSE, returnNuisance = TRUE,
  cvFolds = 1, maxIter = 3, tolIC = 1/length(Y), tolg = 0.01,
  verbose = FALSE, Qsteps = 2, Qn = NULL, gn = NULL,
  use_future = FALSE, ...)

Value

An object of class "drtmle".

drtmle: A list of doubly-robust point estimates and a doubly-robust covariance matrix
nuisance_drtmle: A list of the final TMLE estimates of the outcome regression ($QnStar), propensity score ($gnStar), and reduced-dimension regressions ($QrnStar, $grnStar) evaluated at the observed data values.
ic_drtmle: A list of the empirical mean of the efficient influence function ($eif) and the extra pieces of the influence function resulting from misspecification. All should be smaller than tolIC (unless maxIter was reached first). Also includes a matrix of the influence function values at the estimated nuisance parameters evaluated at the observed data.
aiptw_c: A list of doubly-robust point estimates and a non-doubly-robust covariance matrix. Theory does not guarantee performance of inference for these estimators, but simulation studies showed they often perform adequately.
nuisance_aiptw: A list of the initial estimates of the outcome regression, propensity score, and reduced-dimension regressions evaluated at the observed data values.
tmle: A list of doubly-robust point estimates and non-doubly-robust covariance for the standard TMLE estimator.
aiptw: A list of doubly-robust point estimates and non-doubly-robust covariance matrix for the standard AIPTW estimator.
gcomp: A list of non-doubly-robust point estimates and non-doubly-robust covariance matrix for the standard G-computation estimator. If super learner is used there is no guarantee of correct inference for this estimator.
QnMod: The fitted object for the outcome regression. Returns NULL if returnModels = FALSE.
gnMod: The fitted object for the propensity score. Returns NULL if returnModels = FALSE.
QrnMod: The fitted object for the reduced-dimension regression that guards against misspecification of the outcome regression. Returns NULL if returnModels = FALSE.
grnMod: The fitted object for the reduced-dimension regression that guards against misspecification of the propensity score. Returns NULL if returnModels = FALSE.
a_0: The treatment levels that were requested for computation of covariate-adjusted means.

Arguments

Y: A numeric continuous or binary outcomes.
A: A numeric vector of discrete-valued treatment assignment.
W: A data.frame of named covariates.
DeltaA: A numeric vector of missing treatment indicator (assumed to be equal to 0 if missing 1 if observed).
DeltaY: A numeric vector of missing outcome indicator (assumed to be equal to 0 if missing 1 if observed).
a_0: A numeric vector of fixed treatment values at which to return marginal mean estimates.
family: A family object equal to either binomial() or gaussian(), to be passed to the SuperLearner or glm function.
stratify: A boolean indicating whether to estimate the outcome regression separately for different values of A (if TRUE) or to pool across A (if FALSE).
SL_Q: A vector of characters or a list describing the Super Learner library to be used for the outcome regression. See SuperLearner for details.
SL_g: A vector of characters describing the super learner library to be used for each of the propensity score regressions (DeltaA, A, and DeltaY). To use the same library for each of the regressions (or if there is no missing data in A nor Y), a single library may be input. See SuperLearner for details on how super learner libraries can be specified.
SL_Qr: A vector of characters or a list describing the Super Learner library to be used for the reduced-dimension outcome regression.
SL_gr: A vector of characters or a list describing the Super Learner library to be used for the reduced-dimension propensity score.
n_SL: Number of repeated Super Learners to run (default 1) for the each nuisance parameter. Repeat Super Learners more times to obtain more stable inference.
avg_over: If multiple Super Learners are run, on which scale should the results be aggregated. Options include: "SL" = repeated nuisance parameter estimates are averaged before subsequently generating a single vector of point estimates based on the averaged models; "drtmle" = repeated vectors of point estimates are generated and averaged. Both can be specified, recognizing that this adds considerable computational expense. In this case, the final estimates are the average of n_SL point estimates where each is built by averaging n_SL fits. If NULL, no averaging is performed (in which case n_SL should be set equal to 1).
se_cv: Should cross-validated nuisance parameter estimates be used for computing standard errors? Options are "none" = no cross-validation is performed; "partial" = only applicable if Super Learner is used for nuisance parameter estimates; "full" = full cross-validation is performed. See vignette for further details. Ignored if cvFolds > 1, since then cross-validated nuisance parameter estimates are used by default and it is assumed that you want full cross-validated standard errors.
se_cvFolds: If cross-validated nuisance parameter estimates are used to compute standard errors, how many folds should be used in this computation. If se_cv = "partial", then this option sets the number of folds used by the SuperLearner fitting procedure.
targeted_se: A boolean indicating whether the targeted nuisance parameters should be used in standard error computation or the initial estimators. If se_cv is not set to "none", this option is ignored and standard errors are computed based on non-targeted, cross-validated nuisance parameter fits.
glm_Q: A character describing a formula to be used in the call to glm for the outcome regression. Ignored if SL_Q!=NULL.
glm_g: A list of characters describing the formulas to be used for each of the propensity score regressions (DeltaA, A, and DeltaY). To use the same formula for each of the regressions (or if there are no missing data in A nor Y), a single character formula may be input. In general the formulas can reference any variable in colnames(W), unless adapt_g = TRUE in which case the formulas should reference variables QaW where a takes values in a_0.
glm_Qr: A character describing a formula to be used in the call to glm for reduced-dimension outcome regression. Ignored if SL_Qr!=NULL. The formula should use the variable name 'gn'.
glm_gr: A character describing a formula to be used in the call to glm for the reduced-dimension propensity score. Ignored if SL_gr!=NULL. The formula should use the variable name 'Qn' and 'gn' if reduction='bivariate' and 'Qn' otherwise.
adapt_g: A boolean indicating whether the propensity score should be outcome adaptive. If TRUE then the propensity score is estimated as the regression of A onto covariates QaW for a in each value contained in a_0. See vignette for more details.
guard: A character vector indicating what pattern of misspecifications to guard against. If guard contains "Q", then the TMLE guards against misspecification of the outcome regression by estimating the reduced-dimension outcome regression specified by glm_Qr or SL_Qr. If guard contains "g" then the TMLE (additionally) guards against misspecification of the propensity score by estimating the reduced-dimension propensity score specified by glm_gr or SL_gr. If guard is set to NULL, then only standard TMLE and one-step estimators are computed.
reduction: A character equal to "univariate" for a univariate misspecification correction (default) or "bivariate" for the bivariate version.
returnModels: A boolean indicating whether to return model fits for the outcome regression, propensity score, and reduced-dimension regressions.
returnNuisance: A boolean indicating whether to return the estimated nuisance regressions evaluated on the observed data. Defaults to TRUE. If n_SL is large and "drtmle" is in avg_over, then consider setting to FALSE in order to reduce size of resultant object.
cvFolds: A numeric equal to the number of folds to be used in cross-validated fitting of nuisance parameters. If cvFolds = 1, no cross-validation is used. Alternatively, cvFolds may be entered as a vector of fold assignments for observations, in which case its length should be the same length as Y.
maxIter: A numeric that sets the maximum number of iterations the TMLE can perform in its fluctuation step.
tolIC: A numeric that defines the stopping criteria based on the empirical mean of the influence function.
tolg: A numeric indicating the minimum value for estimates of the propensity score.
verbose: A boolean indicating whether to print status updates.
Qsteps: A numeric equal to 1 or 2 indicating whether the fluctuation submodel for the outcome regression should be fit using a single minimization (Qsteps = 1) or a backfitting-type minimization (Qsteps=2). The latter was found to be more stable in simulations and is the default.
Qn: An optional list of outcome regression estimates. If specified, the function will ignore the nuisance parameter estimation specified by SL_Q and glm_Q. The entries in the list should correspond to the outcome regression evaluated at A and the observed values of W, with order determined by the input to a_0 (e.g., if a_0 = c(0, 1) then Qn[[1]] should be outcome regression at A = 0 and Qn[[2]] should be outcome regression at A = 1).
gn: An optional list of propensity score estimates. If specified, the function will ignore the nuisance parameter estimation specified by SL_g and glm_g. The entries in the list should correspond to the propensity for the observed values of W, with order determined by the input to a_0 (e.g., if a_0 = c(0,1) then gn[[1]] should be propensity of A = 0 and gn[[2]] should be propensity of A = 1).
use_future: Boolean indicating whether to use future_lapply or instead to just use lapply. The latter can be easier to run down errors.
...: Other options (not currently used).

Examples

Run this code

# load super learner
library(SuperLearner)
# simulate data
set.seed(123456)
n <- 100
W <- data.frame(W1 = runif(n), W2 = rnorm(n))
A <- rbinom(n, 1, plogis(W$W1 - W$W2))
Y <- rbinom(n, 1, plogis(W$W1 * W$W2 * A))
# A quick example of drtmle:
# We note that more flexible super learner libraries
# are available, and that we recommend the user use more flexible
# libraries for SL_Qr and SL_gr for general use.
fit1 <- drtmle(
  W = W, A = A, Y = Y, a_0 = c(1, 0),
  family = binomial(),
  stratify = FALSE,
  SL_Q = c("SL.glm", "SL.mean", "SL.glm.interaction"),
  SL_g = c("SL.glm", "SL.mean", "SL.glm.interaction"),
  SL_Qr = "SL.glm",
  SL_gr = "SL.glm", maxIter = 1
)

Run the code above in your browser using DataLab