A function for estimating the proportion of treatment effect explained using repeated cross-fitting.
xfr_surrogate(
ds,
x = NULL,
s,
y,
a,
splits = 50,
K = 5,
outcome_learners = NULL,
ps_learners = NULL,
interaction_model = TRUE,
trim_at = 0.05,
outcome_family = gaussian(),
mthd = "superlearner",
n_ptb = 0,
...
)a tibble with columns:
Rm: estimate of the proportion of treatment effect explained, computed as the median over the repeated splits.
R_se0 standard error for the PTE, accounting for the variability due to splitting.
R_cil0 lower confidence interval value for the PTE.
R_cih0 upper confidence interval value for the PTE.
Dm: estimate of the overall treatment effect, computed as the median over the repeated splits.
D_se0 standard error for the overall treatment effect, accounting for the variability due to splitting.
D_cil0 lower confidence interval value for the overall treatment effect.
D_cih0 upper confidence interval value for the overall treatment effect.
Dsm: estimate of the residual treatment effect, computed as the median over the repeated splits.
Ds_se0 standard error for the residual treatment effect, accounting for the variability due to splitting.
Ds_cil0 lower confidence interval value for the residual treatment effect.
Ds_cih0 upper confidence interval value for the residual treatment effect.
a data.frame.
names of all covariates in ds that should be included to control for confounding (eg. age, sex, etc). Default is NULL.
names of surrogates in ds.
name of the outcome in ds.
treatment variable name (eg. groups). Expect a binary variable made of 1s and 0s.
number of data splits to perform.
number of folds for cross-fitting. Default is 5.
string vector indicating learners to be used for estimation of the outcome function (e.g., "SL.ridge"). See the SuperLearner package for details.
string vector indicating learners to be used for estimation of the propensity score function (e.g., "SL.ridge"). See the SuperLearner package for details.
logical indicating whether outcome functions for treated and control should be estimated separately. Default is TRUE.
threshold at which to trim propensity scores. Default is 0.05.
default is 'gaussian' for continuous outcomes. Other choice is 'binomial' for binary outcomes.
selected regression method. Default is 'superlearner', which uses the SuperLearner package for estimation. Other choices include 'lasso' (which uses glmnet), 'sis' (which uses SIS), 'cal' (which uses RCAL).
Number of perturbations. Default is 0 which means asymptotic standard errors are used.
additional parameters (in particular for super_learner)
n <- 100
p <- 20
q <- 2
wds <- sim_data(n = n, p = p)
if(interactive()){
lasso_est <- xfr_surrogate(ds = wds,
x = paste('x.', 1:q, sep =''),
s = paste('s.', 1:p, sep =''),
a = 'a',
y = 'y',
splits = 2,
K = 2,
trim_at = 0.01,
mthd = 'lasso',
ncores = 1)
}
Run the code above in your browser using DataLab