Learn R Programming

BartMixVs

Overview

This R package is built upon CRAN R package 'BART' version 2.7 (https://github.com/cran/BART) and implements the existing BART-based variable selection approaches discussed in the paper: Luo, C. and Daniels, M. J. (2021). "Variable selection using Bayesian additive regression trees." arXiv preprint arXiv:2112.13998, https://doi.org/10.48550/arXiv.2112.13998.

Background

Bayesian Additive Regression Trees (BART) is a nonparametric regression model which is flexible enough to capture the interactions between predictors and the nonlinear relationships with the response. BART can be used not only for estimation, but also for variable selection. Existing variable selection approaches for BART include:

  1. the permutation-based variable selection approach using BART variable inclusion proportions (VIP) as the variable importance, proposed in Bleich, Justin et al. (2014). "Variable selection for BART: an application to gene regulation." Ann. Appl. Stat. 8.3, pp 1750-1781;
  2. the median probability model from DART which is a variant of BART and proposed in Linero, A. R. (2018). "Bayesian regression trees for high-dimensional prediction and variable selection." J. Amer. Statist. Assoc. 113 626--636;
  3. the variable selection approach using ABC Bayesian forests proposed in Liu, Yi, Veronika Rockova, and Yuexi Wang (2021). "Variable selection with ABC Bayesian forests." J. R. Stat. Soc. Ser. B. Stat. Methodol. 83.3, pp. 453--481.

Luo and Daniels (2021) review these methods with an emphasis on the capability of identifying relevant predictors. Furthermore, out of the consideration of the existence of mixed-type predictors and the goal of allowing more relevant predictors into the model, Luo and Daniels (2021) propose three new methods for BART:

  1. the permutation-based variable selection approach using BART within-type VIP as the variable importance;
  2. the permutation-based variable selection approach using BART Metropolis importance (MI) as the variable importance;
  3. the backward selection with two filters.

See Luo and Daniels (2021) for more details.

Philosophy of BartMixVs

The 'BartMixVs' R package provides data sampling functions to generate the simulation data used in Luo and Daniels (2021), inherits estimation functions from the 'BART' package and implements existing BART-based variable selection approaches (3 old + 3 new) previously mentioned.

  1. Data sampling functions include friedman(), checkerboard(), mixone() and mixtwo() corresponding to Scenario C.C.1 (or B.C.1), Scenario C.C.2 (or B.C.2), Scenario C.M.1 (or B.M.1) and Scenario C.M.2 (or B.M.2) in Luo and Daniels (2021), respectively.
  2. BART estimation and prediction functions inherited from the 'BART' package include wbart() (mc.wbart()), pbart() (mc.pbart()) and pwbart() (mc.pwbart()). Note that while most of the original features of the 'BART' functions are kept, two modifications are made:
    • 'BartMixVs' provides two types of split probability for the tree prior in BART:
      • One type is the split probability used in 'BART' and proposed in Chipman, H. A., George, E. I. and McCulloch, R. E. (2010). "BART: Bayesian additive regression trees." Ann. Appl. Stat. 4 266--298:
      • The other type is the split probability proposed in Rockova V, Saha E (2019). “On theory for BART.” In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 2839–2848). PMLR, which is proved to achieve the optimal posterior contraction:
    • The second modification is that in addition to variable inclusion proportions, marginal posterior variable inclusion probabilities and posterior split probabilities, the estimation functions in 'BartMixVs' also provide two new variable importance measures: within-type variable importance proportions and Metropolis importance.
  3. 'BartMixVs' provides four main functions for the existing BART-based variable selection approaches:
    • The function permute.vs() (or mc.permute.vs() with parallel computation) implements the permutation-based variable selection approach with three types of variable importance measures being considered: BART variable inclusion proportions (Bleich et al. (2014)), BART within-type variable inclusion proportions and BART Metropolis importance (Luo and Daniels (2021)).
    • The function mc.backward.vs() implements the backward selection proposed in Luo and Daniels (2021).
    • The function medianInclusion.vs() implements the DART variable selection approach proposed in Linero (2018).
    • The function abc.vs() (or mc.abc.vs() with parallel computation) implements the ABC Bayesian forests variable selection approach proposed in Liu et al. (2021).

Copy Link

Version

Install

install.packages('BartMixVs')

Monthly Downloads

48

Version

1.0.0

License

GPL (>= 2)

Maintainer

Chuji Luo

Last Published

May 5th, 2022

Functions in BartMixVs (1.0.0)

mc.cores.openmp

Detecting OpenMP
checkerboard

Generate data for an example of Zhu, Zeng and Kosorok (2015)
BartMixVs-package

Varibale Selection Using Bayesian Additive Regression Trees
mc.pbart

Probit BART for binary responses with parallel computation
friedman

Generate data for an example of Friedman (1991)
bartModelMatrix

Create a matrix out of a vector or data frame
mc.abc.vs

Variable selection with ABC Bayesian forest (using parallel computation)
mc.backward.vs

Backward selection with two filters (using parallel computation)
abc.vs

Variable selection with ABC Bayesian forest
mc.permute.vs

Permutation-based variable selection approach with parallel computation
medianInclusion.vs

Variable selection with DART
mixtwo

Generate data with correlated and mixed-type predictors
predict.wbart

Predict new observations with a fitted BART model
permute.vs

Permutation-based variable selection approach
wbart

BART for continuous responses
mixone

Generate data with independent and mixed-type predictors
mc.pwbart

Predicting new observations based on a previously fitted BART model with parallel computation
mc.wbart

BART for continuous responses with parallel computation
predict.pbart

Predict new observations with a fitted BART model
pbart

Probit BART for binary responses with Normal latents
pwbart

Predicting new observations with a previously fitted BART model