Learn R Programming

CausalMetaR (version 0.1.2)

STE_external: Estimating the Subgroup Treatment Effect (STE) in an external target population using multi-source data

Description

Doubly-robust and efficient estimator for the STE in an external target population using multi-source data.

Usage

STE_external(
  X,
  X_external,
  EM,
  EM_external,
  Y,
  S,
  A,
  cross_fitting = FALSE,
  replications = 10L,
  source_model = "MN.glmnet",
  source_model_args = list(),
  treatment_model_type = "separate",
  treatment_model_args = list(),
  external_model_args = list(),
  outcome_model_args = list(),
  show_progress = TRUE
)

Value

An object of class "STE_external". This object is a list with the following elements:

df_dif

A data frame containing the subgroup treatment effect (mean difference) estimates for the extenal data.

df_A0

A data frame containing the subgroup potential outcome mean estimates under A = 0 for the extenal data.

df_A1

A data frame containing the subgroup potential outcome mean estimates under A = 1 for the extenal data.

fit_outcome

Fitted outcome model.

fit_source

Fitted source model.

fit_treatment

Fitted treatment model(s).

fit_external

Fitted external model.

Arguments

X

Data frame (or matrix) containing the covariate data in the multi-source data. It should have \(n\) rows and \(p\) columns. Character variables will be converted to factors.

X_external

Data frame (or matrix) containing the covariate data in the external target population. It should have \(n_0\) rows and \(p\) columns. This is the external data counterpart to the X argument.

EM

Vector of length \(n\) containing the effect modifier in the multi-source data. If EM is a factor, it will maintain its subgroup level order; otherwise it will be converted to a factor with default level order.

EM_external

Vector of length \(n_0\) containing the effect modifier in the external data. This is the external data counterpart to the EM argument.

Y

Vector of length \(n\) containing the outcome.

S

Vector of length \(n\) containing the source indicator. If S is a factor, it will maintain its level order; otherwise it will be converted to a factor with the default level order. The order will be carried over to the outputs and plots.

A

Vector of length \(n\) containing the binary treatment (1 for treated and 0 for untreated).

cross_fitting

Logical specifying whether sample splitting and cross fitting should be used.

replications

Integer specifying the number of sample splitting and cross fitting replications to perform, if cross_fitting = TRUE. The default is 10L.

source_model

Character string specifying the (penalized) multinomial logistic regression for estimating the source model. It has two options: "MN.glmnet" (default) and "MN.nnet", which use glmnet and nnet respectively.

source_model_args

List specifying the arguments for the source model (in glmnet or nnet).

treatment_model_type

Character string specifying how the treatment model is estimated. Options include "separate" (default) and "joint". If "separate", the treatment model (i.e., \(P(A=1|X, S=s)\)) is estimated by regressing \(A\) on \(X\) within each specific internal population \(S=s\). If "joint", the treatment model is estimated by regressing \(A\) on \(X\) and \(S\) using the multi-source population.

treatment_model_args

List specifying the arguments for the treatment model (in SuperLearner).

external_model_args

List specifying the arguments for the external model (in SuperLearner).

outcome_model_args

List specifying the arguments for the outcome model (in SuperLearner).

show_progress

Logical specifying whether to print a progress bar for the cross-fit replicates completed, if cross_fitting = TRUE.

Details

Data structure:

The multi-source dataset consists the outcome Y, source S, treatment A, covariates X (\(n \times p\)), and effect modifier EM in the internal populations. The data sources can be trials, observational studies, or a combination of both.

The external dataset contains only covariates X_external (\(n_0 \times p\)) and the effect modifier EM_external.

Estimation of nuissance parameters:

The following models are fit:

  • External model: \(q(X)=P(R=1|X)\), where \(R\) takes value 1 if the subject belongs to any of the internal dataset and 0 if the subject belongs to the external dataset

  • Propensity score model: \(\eta_a(X)=P(A=a|X)\). We perform the decomposition \(P(A=a|X)=\sum_{s} P(A=a|X, S=s)P(S=s|X)\) and estimate \(P(A=1|X, S=s)\) (i.e., the treatment model) and \(P(S=s|X)\) (i.e., the source model).

  • Outcome model: \(\mu_a(X)=E(Y|X, A=a)\)

The models are estimated by SuperLearner with the exception of the source model which is estimated by glmnet or nnet.

STE estimation:

The estimator is $$ \dfrac{\widehat \kappa}{N}\sum\limits_{i=1}^{N} \Bigg[ I(R_i = 0) \widehat \mu_a(X_i) +I(A_i = a, R_i=1) \dfrac{1-\widehat q(X_i)}{\widehat \eta_a(X_i)\widehat q(X_i)} \Big\{ Y_i - \widehat \mu_a(X_i) \Big\} \Bigg], $$ where \(N=n+n_0\), \(\widehat \kappa=\{N^{-1} \sum_{i=1}^N I(R_i=0)\}^{-1}\), and and \(\widetilde X\) denotes the effect modifier.

The estimator is doubly robust and non-parametrically efficient. To achieve non-parametric efficiency and asymptotic normality, it requires that \(||\widehat \mu_a(X) -\mu_a(X)||\big\{||\widehat \eta_a(X) -\eta_a(X)||+||\widehat q(X) -q(X)||\big\}=o_p(n^{-1/2})\). In addition, sample splitting and cross-fitting can be performed to avoid the Donsker class assumption.

When a data source is a randomized trial, it is still recommended to estimate the propensity score for optimal efficiency.

References

Wang, G., Levis, A., Steingrimsson, J. and Dahabreh, I. (2024) Efficient estimation of subgroup treatment effects using multi-source data, arXiv preprint arXiv:2402.02684.

Wang, G., McGrath, S., Lian, Y. and Dahabreh, I. (2024) CausalMetaR: An R package for performing causally interpretable meta-analyses, arXiv preprint arXiv:2402.04341.

Examples

Run this code
# \donttest{
se <- STE_external(
  X = dat_multisource[, 2:10],
  Y = dat_multisource$Y,
  EM = dat_multisource$EM,
  S = dat_multisource$S,
  A = dat_multisource$A,
  X_external = dat_external[, 2:10],
  EM_external = dat_external$EM,
  cross_fitting = FALSE,
  source_model = "MN.nnet",
  source_model_args = list(),
  treatment_model_type = "separate",
  treatment_model_args = list(
    family = binomial(),
    SL.library = c("SL.glmnet", "SL.nnet", "SL.glm"),
    cvControl = list(V = 5L)
  ),
  external_model_args = list(
    family = binomial(),
    SL.library = c("SL.glmnet", "SL.nnet", "SL.glm"),
    cvControl = list(V = 5L)
  ),
  outcome_model_args = list(
    family = gaussian(),
    SL.library = c("SL.glmnet", "SL.nnet", "SL.glm"),
    cvControl = list(V = 5L)
  )
)
# }

Run the code above in your browser using DataLab