TMLE estimate of the average treatment effect with doubly-robust inference
drtmle(Y, A, W, DeltaA = as.numeric(!is.na(A)),
DeltaY = as.numeric(!is.na(Y)), a_0 = unique(A[!is.na(A)]), family = if
(all(Y %in% c(0, 1))) { stats::binomial() } else {
stats::gaussian() }, stratify = FALSE, SL_Q = NULL, SL_g = NULL,
SL_Qr = NULL, SL_gr = NULL, n_SL = 1, avg_over = "drtmle",
se_cv = "none", se_cvFolds = ifelse(se_cv == "partial", 10, 1),
targeted_se = se_cv != "partial", glm_Q = NULL, glm_g = NULL,
glm_Qr = NULL, glm_gr = NULL, adapt_g = FALSE, guard = c("Q", "g"),
reduction = "univariate", returnModels = FALSE, returnNuisance = TRUE,
cvFolds = 1, maxIter = 3, tolIC = 1/length(Y), tolg = 0.01,
verbose = FALSE, Qsteps = 2, Qn = NULL, gn = NULL,
use_future = FALSE, ...)An object of class "drtmle".
drtmleA list of doubly-robust point estimates and
a doubly-robust covariance matrix
nuisance_drtmleA list of the final TMLE estimates of
the outcome regression ($QnStar), propensity score
($gnStar), and reduced-dimension regressions ($QrnStar,
$grnStar) evaluated at the observed data values.
ic_drtmleA list of the empirical mean of the efficient
influence function ($eif) and the extra pieces of the influence
function resulting from misspecification. All should be smaller than
tolIC (unless maxIter was reached first). Also includes
a matrix of the influence function values at the estimated nuisance
parameters evaluated at the observed data.
aiptw_cA list of doubly-robust point estimates and
a non-doubly-robust covariance matrix. Theory does not guarantee
performance of inference for these estimators, but simulation studies
showed they often perform adequately.
nuisance_aiptwA list of the initial estimates of the
outcome regression, propensity score, and reduced-dimension
regressions evaluated at the observed data values.
tmleA list of doubly-robust point estimates and
non-doubly-robust covariance for the standard TMLE estimator.
aiptwA list of doubly-robust point estimates and
non-doubly-robust covariance matrix for the standard AIPTW estimator.
gcompA list of non-doubly-robust point estimates and
non-doubly-robust covariance matrix for the standard G-computation
estimator. If super learner is used there is no guarantee of correct
inference for this estimator.
QnModThe fitted object for the outcome regression. Returns
NULL if returnModels = FALSE.
gnModThe fitted object for the propensity score. Returns
NULL if returnModels = FALSE.
QrnModThe fitted object for the reduced-dimension regression
that guards against misspecification of the outcome regression.
Returns NULL if returnModels = FALSE.
grnModThe fitted object for the reduced-dimension regression
that guards against misspecification of the propensity score. Returns
NULL if returnModels = FALSE.
a_0The treatment levels that were requested for computation of covariate-adjusted means.
A numeric continuous or binary outcomes.
A numeric vector of discrete-valued treatment assignment.
A data.frame of named covariates.
A numeric vector of missing treatment indicator (assumed
to be equal to 0 if missing 1 if observed).
A numeric vector of missing outcome indicator (assumed
to be equal to 0 if missing 1 if observed).
A numeric vector of fixed treatment values at which to
return marginal mean estimates.
A family object equal to either binomial() or
gaussian(), to be passed to the SuperLearner or glm
function.
A boolean indicating whether to estimate the outcome
regression separately for different values of A (if TRUE) or
to pool across A (if FALSE).
A vector of characters or a list describing the Super Learner
library to be used for the outcome regression. See
SuperLearner for details.
A vector of characters describing the super learner library to be
used for each of the propensity score regressions (DeltaA, A,
and DeltaY). To use the same library for each of the regressions (or
if there is no missing data in A nor Y), a single library may
be input. See SuperLearner for details on how
super learner libraries can be specified.
A vector of characters or a list describing the Super Learner library to be used for the reduced-dimension outcome regression.
A vector of characters or a list describing the Super Learner library to be used for the reduced-dimension propensity score.
Number of repeated Super Learners to run (default 1) for the each nuisance parameter. Repeat Super Learners more times to obtain more stable inference.
If multiple Super Learners are run, on which scale should the
results be aggregated. Options include: "SL" =
repeated nuisance parameter estimates are averaged before subsequently
generating a single vector of point estimates based on the averaged models;
"drtmle" = repeated vectors of point estimates are generated and
averaged. Both can be specified, recognizing that this adds considerable
computational expense. In this case, the final estimates are the average
of n_SL point estimates where each is built by averaging n_SL
fits. If NULL, no averaging is performed (in which case n_SL
should be set equal to 1).
Should cross-validated nuisance parameter estimates be used
for computing standard errors?
Options are "none" = no cross-validation is performed; "partial" =
only applicable if Super Learner is used for nuisance parameter estimates;
"full" = full cross-validation is performed. See vignette for further
details. Ignored if cvFolds > 1, since then
cross-validated nuisance parameter estimates are used by default and it is
assumed that you want full cross-validated standard errors.
If cross-validated nuisance parameter estimates are used
to compute standard errors, how many folds should be used in this computation.
If se_cv = "partial", then this option sets the number of folds used
by the SuperLearner fitting procedure.
A boolean indicating whether the targeted nuisance
parameters should be used in standard error computation or the initial
estimators. If se_cv is not set to "none", this option is
ignored and standard errors are computed based on non-targeted, cross-validated
nuisance parameter fits.
A character describing a formula to be used in the call to
glm for the outcome regression. Ignored if SL_Q!=NULL.
A list of characters describing the formulas to be used
for each of the propensity score regressions (DeltaA, A, and
DeltaY). To use the same formula for each of the regressions (or if
there are no missing data in A nor Y), a single character
formula may be input. In general the formulas can reference any variable in
colnames(W), unless adapt_g = TRUE in which case the formulas
should reference variables QaW where a takes values in a_0.
A character describing a formula to be used in the call to
glm for reduced-dimension outcome regression. Ignored if
SL_Qr!=NULL. The formula should use the variable name 'gn'.
A character describing a formula to be used in the call to
glm for the reduced-dimension propensity score. Ignored if
SL_gr!=NULL. The formula should use the variable name 'Qn' and
'gn' if reduction='bivariate' and 'Qn' otherwise.
A boolean indicating whether the propensity score should be
outcome adaptive. If TRUE then the propensity score is estimated as the
regression of A onto covariates QaW for a in each value
contained in a_0. See vignette for more details.
A character vector indicating what pattern of misspecifications
to guard against. If guard contains "Q", then the TMLE guards
against misspecification of the outcome regression by estimating the
reduced-dimension outcome regression specified by glm_Qr or
SL_Qr. If guard contains "g" then the TMLE
(additionally) guards against misspecification of the propensity score by
estimating the reduced-dimension propensity score specified by glm_gr
or SL_gr. If guard is set to NULL, then only standard TMLE
and one-step estimators are computed.
A character equal to "univariate" for a univariate
misspecification correction (default) or "bivariate" for the
bivariate version.
A boolean indicating whether to return model fits for the outcome regression, propensity score, and reduced-dimension regressions.
A boolean indicating whether to return the estimated
nuisance regressions evaluated on the observed data. Defaults to TRUE.
If n_SL is large and "drtmle" is in avg_over, then
consider setting to FALSE in order to reduce size of resultant object.
A numeric equal to the number of folds to be used in
cross-validated fitting of nuisance parameters. If cvFolds = 1, no
cross-validation is used. Alternatively, cvFolds may be entered as a
vector of fold assignments for observations, in which case its length should
be the same length as Y.
A numeric that sets the maximum number of iterations the TMLE can perform in its fluctuation step.
A numeric that defines the stopping criteria based on the empirical mean of the influence function.
A numeric indicating the minimum value for estimates of the propensity score.
A boolean indicating whether to print status updates.
A numeric equal to 1 or 2 indicating whether the fluctuation
submodel for the outcome regression should be fit using a single
minimization (Qsteps = 1) or a backfitting-type minimization
(Qsteps=2). The latter was found to be more stable in simulations and
is the default.
An optional list of outcome regression estimates. If specified, the
function will ignore the nuisance parameter estimation specified by
SL_Q and glm_Q. The entries in the list should correspond to
the outcome regression evaluated at A and the observed values of
W, with order determined by the input to a_0 (e.g., if
a_0 = c(0, 1) then Qn[[1]] should be outcome regression at
A = 0 and Qn[[2]] should be outcome regression at
A = 1).
An optional list of propensity score estimates. If specified, the
function will ignore the nuisance parameter estimation specified by
SL_g and glm_g. The entries in the list should correspond to
the propensity for the observed values of W, with order determined by
the input to a_0 (e.g., if a_0 = c(0,1) then gn[[1]]
should be propensity of A = 0 and gn[[2]] should be propensity
of A = 1).
Boolean indicating whether to use future_lapply or
instead to just use lapply. The latter can be easier to run down errors.
Other options (not currently used).
# load super learner
library(SuperLearner)
# simulate data
set.seed(123456)
n <- 100
W <- data.frame(W1 = runif(n), W2 = rnorm(n))
A <- rbinom(n, 1, plogis(W$W1 - W$W2))
Y <- rbinom(n, 1, plogis(W$W1 * W$W2 * A))
# A quick example of drtmle:
# We note that more flexible super learner libraries
# are available, and that we recommend the user use more flexible
# libraries for SL_Qr and SL_gr for general use.
fit1 <- drtmle(
W = W, A = A, Y = Y, a_0 = c(1, 0),
family = binomial(),
stratify = FALSE,
SL_Q = c("SL.glm", "SL.mean", "SL.glm.interaction"),
SL_g = c("SL.glm", "SL.mean", "SL.glm.interaction"),
SL_Qr = "SL.glm",
SL_gr = "SL.glm", maxIter = 1
)
Run the code above in your browser using DataLab