xfr_surrogate: A function for estimating the proportion of treatment effect explained using repeated cross-fitting.

Description

A function for estimating the proportion of treatment effect explained using repeated cross-fitting.

Usage

xfr_surrogate(
  ds,
  x = NULL,
  s,
  y,
  a,
  splits = 50,
  K = 5,
  outcome_learners = NULL,
  ps_learners = NULL,
  interaction_model = TRUE,
  trim_at = 0.05,
  outcome_family = gaussian(),
  mthd = "superlearner",
  n_ptb = 0,
  ...
)

Value

a tibble with columns:

Rm: estimate of the proportion of treatment effect explained, computed as the median over the repeated splits.
R_se0 standard error for the PTE, accounting for the variability due to splitting.
R_cil0 lower confidence interval value for the PTE.
R_cih0 upper confidence interval value for the PTE.
Dm: estimate of the overall treatment effect, computed as the median over the repeated splits.
D_se0 standard error for the overall treatment effect, accounting for the variability due to splitting.
D_cil0 lower confidence interval value for the overall treatment effect.
D_cih0 upper confidence interval value for the overall treatment effect.
Dsm: estimate of the residual treatment effect, computed as the median over the repeated splits.
Ds_se0 standard error for the residual treatment effect, accounting for the variability due to splitting.
Ds_cil0 lower confidence interval value for the residual treatment effect.
Ds_cih0 upper confidence interval value for the residual treatment effect.

Arguments

ds: a data.frame.
x: names of all covariates in ds that should be included to control for confounding (eg. age, sex, etc). Default is NULL.
s: names of surrogates in ds.
y: name of the outcome in ds.
a: treatment variable name (eg. groups). Expect a binary variable made of 1s and 0s.
splits: number of data splits to perform.
K: number of folds for cross-fitting. Default is 5.
outcome_learners: string vector indicating learners to be used for estimation of the outcome function (e.g., "SL.ridge"). See the SuperLearner package for details.
ps_learners: string vector indicating learners to be used for estimation of the propensity score function (e.g., "SL.ridge"). See the SuperLearner package for details.
interaction_model: logical indicating whether outcome functions for treated and control should be estimated separately. Default is TRUE.
trim_at: threshold at which to trim propensity scores. Default is 0.05.
outcome_family: default is 'gaussian' for continuous outcomes. Other choice is 'binomial' for binary outcomes.
mthd: selected regression method. Default is 'superlearner', which uses the SuperLearner package for estimation. Other choices include 'lasso' (which uses glmnet), 'sis' (which uses SIS), 'cal' (which uses RCAL).
n_ptb: Number of perturbations. Default is 0 which means asymptotic standard errors are used.
...: additional parameters (in particular for super_learner)

Examples

Run this code


n <- 100
p <- 20
q <- 2
wds <- sim_data(n = n, p = p)

if(interactive()){
 lasso_est <- xfr_surrogate(ds = wds,
   x = paste('x.', 1:q, sep =''),
   s = paste('s.', 1:p, sep =''),
   a = 'a',
   y = 'y',
   splits = 2,
   K = 2,
   trim_at = 0.01,
   mthd = 'lasso',
   ncores = 1)
}

Run the code above in your browser using DataLab