mc.backward.vs: Backward selection with two filters (using parallel computation)

Description

This function implements the backward variable selection approach for BART (see Algorithm 2 in Luo and Daniels (2021) for details). Parallel computation is used within each step of the backward selection approach.

Usage

mc.backward.vs(
  x,
  y,
  split.ratio = 0.8,
  probit = FALSE,
  true.idx = NULL,
  xinfo = matrix(0, 0, 0),
  numcut = 100L,
  usequants = FALSE,
  cont = FALSE,
  rm.const = TRUE,
  k = 2,
  power = 2,
  base = 0.95,
  split.prob = "polynomial",
  ntree = 50L,
  ndpost = 1000,
  nskip = 1000,
  keepevery = 1L,
  printevery = 100L,
  verbose = FALSE,
  mc.cores = 2L,
  nice = 19L,
  seed = 99L
)

Arguments

A matrix or a data frame of predictors values with each row corresponding to an observation and each column corresponding to a predictor. If a predictor is a factor with \(q\) levels in a data frame, it is replaced with \(q\) dummy variables.

A vector of response (continuous or binary) values.

split.ratio

A number between \(0\) and \(1\); the data set (x, y) is split into a training set and a testing set according to the split.ratio.

probit

A Boolean argument indicating whether the response variable is binary or continuous; probit=FALSE (by default) means that the response variable is continuous.

true.idx

(Optional) A vector of indices of the true relevant predictors; if provided, metrics including precision, recall and F1 score are returned.

xinfo

A matrix of cut-points with each row corresponding to a predictor and each column corresponding to a cut-point. xinfo=matrix(0.0,0,0) indicates the cut-points are specified by BART.

numcut

The number of possible cut-points; If a single number is given, this is used for all predictors; Otherwise a vector with length equal to ncol(x) is required, where the \(i-\)th element gives the number of cut-points for the \(i-\)th predictor in x. If usequants=FALSE, numcut equally spaced cut-points are used to cover the range of values in the corresponding column of x. If usequants=TRUE, then min(numcut, the number of unique values in the corresponding column of x - 1) cut-point values are used.

usequants

A Boolean argument indicating how the cut-points in xinfo are generated; If usequants=TRUE, uniform quantiles are used for the cut-points; Otherwise, the cut-points are generated uniformly.

cont

A Boolean argument indicating whether to assume all predictors are continuous.

rm.const

A Boolean argument indicating whether to remove constant predictors.

The number of prior standard deviations that \(E(Y|x) = f(x)\) is away from \(+/-.5\). The response (y) is internally scaled to the range from \(-.5\) to \(.5\). The bigger k is, the more conservative the fitting will be.

power

The power parameter of the polynomial splitting probability for the tree prior. Only used if split.prob="polynomial".

base

The base parameter of the polynomial splitting probability for the tree prior if split.prob="polynomial"; if split.prob="exponential", the probability of splitting a node at depth \(d\) is base\(^d\).

split.prob

A string indicating what kind of splitting probability is used for the tree prior. If split.prob="polynomial", the splitting probability in Chipman et al. (2010) is used; If split.prob="exponential", the splitting probability in Rockova and Saha (2019) is used.

ntree

The number of trees in the ensemble.

ndpost

The number of posterior samples returned.

nskip

The number of posterior samples burned in.

keepevery

Every keepevery posterior sample is kept to be returned to the user.

printevery

As the MCMC runs, a message is printed every printevery iterations.

verbose

A Boolean argument indicating whether any messages are printed out.

mc.cores

The number of cores to employ in parallel.

nice

Set the job niceness. The default niceness is \(19\) and niceness goes from \(0\) (highest) to \(19\) (lowest).

seed

Seed required for reproducible MCMC.

Value

The function mc.backward.vs() returns a list with the following components.

best.model.names

The vector of column names of the predictors selected by the backward selection approach.

best.model.cols

The vector of column indices of the predictors selected by the backward selection approach.

best.model.order

The step where the best model is located.

models

The list of winner models from each step of the backward selection procedure; length equals ncol{x}.

model.errors

The vector of MSEs (or MLLs if the response variable is binary) for the ncol{x} winner models.

elpd.loos

The vector of LOO scores for the ncol{x} winner models.

all.models

The list of all the evaluated models.

all.model.errors

The vector of MSEs (or MLLs if the response variable is binary) for all the evaluated models.

precision

The precision score for the backward selection approach; only returned if true.idx is provided.

recall

The recall score for the backward selection approach; only returned if true.idx is provided.

The F1 score for the backward selection approach; only returned if true.idx is provided.

all.models.idx

The vector of Boolean arguments indicating whether the corresponding model in all.models is acceptable or not; a model containing all the relevant predictors is an acceptable model; only returned if true.idx is provided.

Details

The backward selection starts with the full model with all the predictors, followed by comparing the deletion of each predictor using mean squared error (MSE) if the response variable is continuous (or mean log loss (MLL) if the response variable is binary) and then deleting the predictor whose loss gives the smallest MSE (or MLL). This process is repeated until there is only one predictor in the model and ultimately returns ncol{x} "winner" models with different model sizes ranging from \(1\) to ncol{x}. Given the ncol{x} "winner" models, the one with the largest expected log pointwise predictive density based on leave-one-out (LOO) cross validation is the best model. See Section 3.3 in Luo and Daniels (2021) for details. If true.idx is provided, the precision, recall and F1 scores are returned.

References

Chipman, H. A., George, E. I. and McCulloch, R. E. (2010). "BART: Bayesian additive regression trees." Ann. Appl. Stat. 4 266--298.

Luo, C. and Daniels, M. J. (2021) "Variable Selection Using Bayesian Additive Regression Trees." arXiv preprint arXiv:2112.13998.

Rockova V, Saha E (2019). <U+201C>On theory for BART.<U+201D> In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 2839<U+2013>2848). PMLR.

Vehtari, Aki, Andrew Gelman, and Jonah Gabry (2017). "Erratum to: Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC." Stat. Comput. 27.5, p. 1433.

Examples

Run this code

# NOT RUN {
## simulate data (Scenario C.C.1. in Luo and Daniels (2021))
set.seed(123)
data = friedman(100, 5, 1, FALSE)
## parallel::mcparallel/mccollect do not exist on windows
if(.Platform$OS.type=='unix') {
## test mc.backward.vs() function
  res = mc.backward.vs(data$X, data$Y, split.ratio=0.8, probit=FALSE, 
  true.idx=c(1:5), ntree=10, ndpost=100, nskip=100, mc.cores=2)
}
# }

Run the code above in your browser using DataLab