Model-free selection of covariates in high dimensions under unconfoundedness for situations where the parameter of interest is an average causal effect. This package is based on model-free backward elimination algorithms proposed in de Luna, Waernbaum and Richardson (2011) and VanderWeele and Shpitser (2011). Confounder selection can be performed via either Markov/Bayesian networks, random forests or LASSO.
cov.sel.high(T=NULL, Y=NULL, X=NULL,type=c("mmpc","mmhc","rf","lasso"),
betahat=TRUE, parallel=FALSE, Simulate=TRUE,N=NULL, Setting=1,
rep=1, Models=c("Linear", "Nonlinear", "Binary"),
alpha=0.05, mmhc_score=c("aic","bic"))A vector, containing 0 and 1, indicating a binary treatment variable.
A vector of observed outcomes.
A matrix or data frame containing columns of covariates. The covariates may be a mix of continuous, unordered discrete
(to be specified in the data frame using factor), and ordered discrete (to be specified in the data frame using ordered).
The type of method used for selection. The networks algorithms are "mmpc" for min-max parents and children (Markov network) and "mmhc" for max-min hill climbing (Bayesian network). Other available methods are random forests, "rf", and LASSO, "lasso".
If parallel=TRUE and there is a registered parallel backend then the computation will be parallelized. Default is parallel=FALSE.
If data is to be simulated according to one of the designs in H<U+00E4>ggstr<U+00F6>m (2017) then Simulate should be set to TRUE.
If Simulate=TRUE, N is the number of observations to be simulated.
If Simulate=TRUE, Setting is the simulation setting to be used. Unconfoundedness holds given X if Setting=1. M-bias given X if Setting=2.
If Simulate=TRUE, rep is the number of replications to be simulated.
If Simulate=TRUE, Models is the type of outcome models to be used, options are "Linear", "Nonlinear" and "Binary".
A numeric value, the target nominal type I error rate (tuning parameter) for "mmpc" and "mmhc".
The score to use for "mmhc".
cov.sel.high returns a list with the following content:
The set of covariates targeting the subset containing all causes of T.
The set of covariates targeting the subset of X.T which is also associated with Y given T=0, the response in the control group.
The set of covariates targeting the subset of X.T which is also associated with Y given T=1, the response in the treatment group.
Union of Q.0 and Q.1.
The set of covariates targeting the subset containing all causes of Y given T=0.
The set of covariates targeting the subset containing all causes of Y given T=1.
Union of X.0 and X.1.
The set of covariates targeting the subset of X.0 which is also associated with T.
The set of covariates targeting the subset of X.1 which is also associated with T.
Union of Z.0 and Z.1.
Union of X.T and X.Y, the set of covariates targeting the subset containing all causes of T and Y.
The cardinalities of each selected subset.
The PSM estimate of the average causal effect, for the full covariate vector and each selected subset.
The Abadie-Imbens standard error for the PSM estimate of the average causal effect, for the full covariate vector and each selected subset.
The TMLE estimate of the average causal effect, for the full covariate vector and each selected subset.
The influence-curve based standard error for the TMLE estimate of the average causal effect, for the full covariate vector and each selected subset.
The number of observations.
The Setting used.
The number of replications.
Models used.
type used.
alpha used.
score used.
Variable names of the used data.
See H<U+00E4>ggstr<U+00F6>m (2017).
de Luna, X., I. Waernbaum, and T. S. Richardson (2011). Covariate selection for the nonparametric estimation of an average treatment effect. Biometrika 98. 861-875
H<U+00E4>ggstr<U+00F6>m, J. (2017). Data-Driven Confounder Selection via Markov and Bayesian Networks. ArXiv e-prints.
Nagarajan, R., M. Scutari and S. Lebre. (2013) Bayesian Networks in R with Applications in Systems Biology. Springer, New York. ISBN 978-1461464457.
Scutari, M. (2010). Learning Bayesian Networks with the bnlearn R Package. Journal of Statistical Software, 35, 1-22. URL http://www.jstatsoft.org/v35/i03/.
Sekhon, J.S. (2011). Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching Package for R. Journal of Statistical Software, 42, 1-52. URL http://www.jstatsoft.org/v42/i07/.
##Use simulated data, select subsets using mmpc
ans<-cov.sel.high(type="mmpc",N=1000, rep=2, Models="Linear", betahat=FALSE, mmhc_score="aic")
##Use simulated data, select subsets using mmpc and estimate ACEs, parallell version
#library(doParallel)
#library(doRNG)
#cl <- makeCluster(4)
#registerDoParallel(cl)
#ans<-cov.sel.high(type="mmpc", parallel=TRUE, N=500, rep=10, Models="Linear", mmhc_score="aic")
#stopCluster(cl)
Run the code above in your browser using DataLab