cov.sel.high: Model-Free Covariate Selection in High Dimensions

Description

Model-free selection of covariates in high dimensions under unconfoundedness for situations where the parameter of interest is an average causal effect. This package is based on model-free backward elimination algorithms proposed in de Luna, Waernbaum and Richardson (2011) and VanderWeele and Shpitser (2011). Confounder selection can be performed via either Markov/Bayesian networks, random forests or LASSO.

Usage

cov.sel.high(T=NULL, Y=NULL, X=NULL,type=c("mmpc","mmhc","rf","lasso"),  betahat=TRUE, parallel=FALSE, Simulate=TRUE,N=NULL, Setting=1,rep=1, Models=c("Linear", "Nonlinear", "Binary"),...)

Arguments

A vector, containing 0 and 1, indicating a binary treatment variable.

A vector of observed outcomes.

A matrix or data frame containing columns of covariates. The covariates may be a mix of continuous, unordered discrete (to be specified in the data frame using factor), and ordered discrete (to be specified in the data frame using ordered).

type

The type of method used for selection. The networks algorithms are "mmpc" for min-max parents and children (Markov network) and "mmhc" for max-min hill climbing (Bayesian network). Other available methods are random forests, "rf" and LASSO, "lasso".

betahat

If betahat=TRUE the average treatment effect for each selected subset and the full covariate vector is estimated using propensity score matching via the function Match.

parallel

If parallel=TRUE and there is a registered parallel backend then the computation will be parallelized. Default is parallel=FALSE.

Simulate

If data is to be simulated according to one of the designs in Häggström (2016) then Simulate should be set to TRUE.

If Simulate=TRUE, N is the number of observations to be simulated.

Setting

If Simulate=TRUE, Setting is the simulation setting to be used. Unconfoundedness holds given X if Setting=1. M-bias given X if Setting=2.

rep

If Simulate=TRUE, rep is the number of replications to be simulated.

Models

If Simulate=TRUE, Models is the type of outcome models to be used, options are "Linear", "Nonlinear" and "Binary".

...

Additional arguments passed on to mmpc or mmhc.

Value

X.T: The set of covariates targeting the subset containing all causes of T.
Q.0: The set of covariates targeting the subset of X.T which is also dependent with Y given T=0, the response in the control group.
Q.1: The set of covariates targeting the subset of X.T which is also dependent with Y given T=1, the response in the treatment group.
Q: Union of Q.0 and Q.1.
X.0: The set of covariates targeting the subset containing all causes of Y given T=0.
X.1: The set of covariates targeting the subset containing all causes of Y given T=1.
X.Y: Union of X.0 and X.1.
Z.0: The set of covariates targeting the subset of X.0 which is also dependent with T.
Z.1: The set of covariates targeting the subset of X.1 which is also dependent with T.
Z: Union of Z.0 and Z.1.
X.TY: Union of X.T and X.Y, the set of covariates targeting the subset containing all causes of T and Y.
cardinalities: The cardinalities of each selected subset.
est: The estimated average causal effect for the full covariate vector and each selected subset.
se: The Abadie-Imbens standard error for the estimated average causal effect for the full covariate vector and each selected subset.
N: The number of observations.
Setting: The Setting used.
rep: The number of replications.
Models: Models used.
type: type used.
varnames: Variable names of the used data.

Details

See Häggström (2016).

References

de Luna, X., I. Waernbaum, and T. S. Richardson (2011). Covariate selection for the nonparametric estimation of an average treatment effect. Biometrika 98. 861-875

Häggström, J. (2016). Data-Driven Confounder Selection via Markov and Bayesian Networks. ArXiv e-prints.

Nagarajan, R., M. Scutari and S. Lebre. (2013) Bayesian Networks in R with Applications in Systems Biology. Springer, New York. ISBN 978-1461464457.

Scutari, M. (2010). Learning Bayesian Networks with the bnlearn R Package. Journal of Statistical Software, 35, 1-22. URL http://www.jstatsoft.org/v35/i03/.

Sekhon, J.S. (2011). Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching Package for R. Journal of Statistical Software, 42, 1-52. URL http://www.jstatsoft.org/v42/i07/.

Examples

Run this code

##Use simulated data, select subsets using mmpc and estimate ACEs
ans<-cov.sel.high(type="mmpc",N=500, rep=2, Models="Linear")


##Use simulated data, select subsets using mmpc and estimate ACEs, parallell version
#library(doParallel)
#cl <- makeCluster(4)
#registerDoParallel(cl)
#ans<-cov.sel.high(type="mmpc", parallel=TRUE,  N=500, rep=10, Models="Linear")
#stopCluster(cl)

Run the code above in your browser using DataLab