Doubly-robust and efficient estimator for the ATE in each internal target population using multi-source data.
ATE_internal(
X,
Y,
S,
A,
cross_fitting = FALSE,
replications = 10L,
source_model = "MN.glmnet",
source_model_args = list(),
treatment_model_type = "separate",
treatment_model_args = list(),
outcome_model_args = list(),
show_progress = TRUE
)An object of class "ATE_internal". This object is a list with the following elements:
A data frame containing the treatment effect (mean difference) estimates for the internal populations.
A data frame containing the potential outcome mean estimates under A = 0 for the internal populations.
A data frame containing the potential outcome mean estimates under A = 1 for the internal populations.
Fitted outcome model.
Fitted source model.
Fitted treatment model(s).
Data frame (or matrix) containing the covariate data in the multi-source data. It should have \(n\) rows and \(p\) columns. Character variables will be converted to factors.
Vector of length \(n\) containing the outcome.
Vector of length \(n\) containing the source indicator. If S is a factor, it will maintain its level order; otherwise it will be converted to a factor with the default level order. The order will be carried over to the outputs and plots.
Vector of length \(n\) containing the binary treatment (1 for treated and 0 for untreated).
Logical specifying whether sample splitting and cross fitting should be used.
Integer specifying the number of sample splitting and cross fitting replications to perform, if cross_fitting = TRUE. The default is 10L.
Character string specifying the (penalized) multinomial logistic regression for estimating the source model. It has two options: "MN.glmnet" (default) and "MN.nnet", which use glmnet and nnet respectively.
List specifying the arguments for the source model (in glmnet or nnet).
Character string specifying how the treatment model is estimated. Options include "separate" (default) and "joint". If "separate", the treatment model (i.e., \(P(A=1|X, S=s)\)) is estimated by regressing \(A\) on \(X\) within each specific internal population \(S=s\). If "joint", the treatment model is estimated by regressing \(A\) on \(X\) and \(S\) using the multi-source population.
List specifying the arguments for the treatment model (in SuperLearner).
List specifying the arguments for the outcome model (in SuperLearner).
Logical specifying whether to print a progress bar for the cross-fit replicates completed, if cross_fitting = TRUE.
Data structure:
The multi-source dataset consists the outcome Y, source S, treatment A, and covariates X (\(n \times p\)) in the internal populations. The data sources can be trials, observational studies, or a combination of both.
Estimation of nuissance parameters:
The following models are fit:
Propensity score model: \(\eta_a(X)=P(A=a|X)\). We perform the decomposition \(P(A=a|X)=\sum_{s} P(A=a|X, S=s)P(S=s|X)\) and estimate \(P(A=1|X, S=s)\) (i.e., the treatment model) and \(q_s(X)=P(S=s|X)\) (i.e., the source model).
Outcome model: \(\mu_a(X)=E(Y|X, A=a)\)
The models are estimated by SuperLearner with the exception of the source model which is estimated by glmnet or nnet.
ATE estimation:
The ATE estimator is $$ \dfrac{\widehat \kappa}{n}\sum\limits_{i=1}^{n} \Bigg[ I(S_i = s) \widehat \mu_a(X_i) +I(A_i = a) \dfrac{\widehat q_{s}(X_i)}{\widehat \eta_a(X_i)} \Big\{ Y_i - \widehat \mu_a(X_i) \Big\} \Bigg], $$ where \(\widehat \kappa=\{n^{-1} \sum_{i=1}^n I(S_i=s)\}^{-1}\). The estimator is doubly robust and non-parametrically efficient.
To achieve non-parametric efficiency and asymptotic normality, it requires that \(||\widehat \mu_a(X) -\mu_a(X)||\big\{||\widehat \eta_a(X) -\eta_a(X)||+||\widehat q_s(X) -q_s(X)||\big\}=o_p(n^{-1/2})\). In addition, sample splitting and cross-fitting can be performed to avoid the Donsker class assumption.
When a data source is a randomized trial, it is still recommended to estimate the propensity score for optimal efficiency.
Robertson, S.E., Steingrimsson, J.A., Joyce, N.R., Stuart, E.A. and Dahabreh, I.J. (2021). Center-specific causal inference with multicenter trials: Reinterpreting trial evidence in the context of each participating center. arXiv preprint arXiv:2104.05905.
Wang, G., McGrath, S., Lian, Y. and Dahabreh, I. (2024) CausalMetaR: An R package for performing causally interpretable meta-analyses, arXiv preprint arXiv:2402.04341.
# \donttest{
ai <- ATE_internal(
X = dat_multisource[, 1:10],
Y = dat_multisource$Y,
S = dat_multisource$S,
A = dat_multisource$A,
source_model = "MN.glmnet",
source_model_args = list(),
treatment_model_type = "separate",
treatment_model_args = list(
family = binomial(),
SL.library = c("SL.glmnet", "SL.nnet", "SL.glm"),
cvControl = list(V = 5L)
),
outcome_model_args = list(
family = gaussian(),
SL.library = c("SL.glmnet", "SL.nnet", "SL.glm"),
cvControl = list(V = 5L)
)
)
# }
Run the code above in your browser using DataLab