mc.pbart: Probit BART for binary responses with parallel computation

Description

BART is a Bayesian approach to nonparametric function estimation and inference using a sum of trees. For a binary response $y$ and a $p-$dimensional vector of predictors $x = (x_1, ..., x_p)'$, probit BART models $y$ and $x$ using $$P(Y=1|x) = \Phi[f(x)],$$ where $\Phi$ is the CDF of the standard normal distribution and $f$ is a sum of Bayesian regression trees function. The function mc.pbart() is inherited from the CRAN R package 'BART' and is a variant of the function pbart() with parallel computation.

Usage

mc.pbart(
  x.train,
  y.train,
  x.test = matrix(0, 0L, 0L),
  sparse = FALSE,
  theta = 0,
  omega = 1,
  a = 0.5,
  b = 1,
  augment = FALSE,
  rho = NULL,
  xinfo = matrix(0, 0, 0),
  numcut = 100L,
  usequants = FALSE,
  cont = FALSE,
  rm.const = TRUE,
  k = 2,
  power = 2,
  base = 0.95,
  split.prob = "polynomial",
  binaryOffset = NULL,
  ntree = 50L,
  ndpost = 1000L,
  nskip = 100L,
  keepevery = 1L,
  printevery = 100L,
  keeptrainfits = TRUE,
  transposed = FALSE,
  verbose = FALSE,
  mc.cores = 2L,
  nice = 19L,
  seed = 99L
)

Arguments

x.train

A matrix or a data frame of predictors values (for training) with each row corresponding to an observation and each column corresponding to a predictor. If a predictor is a factor with $q$ levels in a data frame, it is replaced with $q$ dummy variables.

y.train

A vector of continuous response values for training.

x.test

A matrix or a data frame of predictors values for testing, which has the same structure as x.train.

sparse

A Boolean argument indicating whether to replace the discrete uniform distribution for selecting a split variable with a categorical distribution whose event probabilities follow a Dirichlet distribution (see Linero (2018) for details).

theta

Set theta parameter; zero means random.

omega

Set omega parameter; zero means random.

A sparse parameter of $Beta(a, b)$ hyper-prior where $0.5<=a<=1$; a lower value induces more sparsity.

A sparse parameter of $Beta(a, b)$ hyper-prior; typically, $b=1$.

augment

A Boolean argument indicating whether data augmentation is performed in the variable selection procedure of Linero (2018).

rho

A sparse parameter; typically $\rho = p$ where $p$ is the number of predictors.

xinfo

A matrix of cut-points with each row corresponding to a predictor and each column corresponding to a cut-point. xinfo=matrix(0.0,0,0) indicates the cut-points are specified by BART.

numcut

The number of possible cut-points; If a single number is given, this is used for all predictors; Otherwise a vector with length equal to ncol(x.train) is required, where the $i-$th element gives the number of cut-points for the $i-$th predictor in x.train. If usequants=FALSE, numcut equally spaced cut-points are used to cover the range of values in the corresponding column of x.train. If usequants=TRUE, then min(numcut, the number of unique values in the corresponding column of x.train - 1) cut-point values are used.

usequants

A Boolean argument indicating how the cut-points in xinfo are generated; If usequants=TRUE, uniform quantiles are used for the cut-points; Otherwise, the cut-points are generated uniformly.

cont

A Boolean argument indicating whether to assume all predictors are continuous.

rm.const

A Boolean argument indicating whether to remove constant predictors.

The number of prior standard deviations that $E(Y|x) = f(x)$ is away from $+/-.5$. The response (y.train) is internally scaled to the range from $-.5$ to $.5$. The bigger k is, the more conservative the fitting will be.

power

The power parameter of the polynomial splitting probability for the tree prior. Only used if split.prob="polynomial".

base

The base parameter of the polynomial splitting probability for the tree prior if split.prob="polynomial"; if split.prob="exponential", the probability of splitting a node at depth $d$ is base$^d$.

split.prob

A string indicating what kind of splitting probability is used for the tree prior. If split.prob="polynomial", the splitting probability in Chipman et al. (2010) is used; If split.prob="exponential", the splitting probability in Rockova and Saha (2019) is used.

binaryOffset

The binary offset term in the probit BART model, i.e., $P(Y=1|x) = \Phi[f(x)+binaryOffset]$.

ntree

The number of trees in the ensemble.

ndpost

The number of posterior samples returned.

nskip

The number of posterior samples burned in.

keepevery

Every keepevery posterior sample is kept to be returned to the user.

printevery

As the MCMC runs, a message is printed every printevery iterations.

keeptrainfits

A Boolean argument indicating whether to keep yhat.train or not.

transposed

A Boolean argument indicating whether the matrices x.train and x.test are transposed.

verbose

A Boolean argument indicating whether any messages are printed out.

mc.cores

The number of cores to employ in parallel.

nice

Set the job niceness. The default niceness is $19$ and niceness goes from $0$ (highest) to $19$ (lowest).

seed

Seed required for reproducible MCMC.

Value

The function pbart() returns an object of type pbart which essentially is a list consisting of the following components.

yhat.train

A matrix with ndpost rows and nrow(x.train) columns with each row corresponding to a draw $f*$ from the posterior of $f$ and each column corresponding to a training data point. The $(i,j)$-th element of the matrix is $f*(x)+binaryOffset$ for the $i$-th kept draw of $f$ and the $j$-th training data point. Burn-in posterior samples are dropped.

prob.train

A matrix with the same structure as yhat.train and the $(i,j)$-th element of prob.train is $\Phi[f*(x)+binaryOffset]$ for the $i$-th kept draw of $f$ and the $j$-th training data point.

prob.train.mean

A vector which is colMeans(prob.train).

yhat.test

A matrix with ndpost rows and nrow(x.test) columns with each row corresponding to a draw $f*$ from the posterior of $f$ and each column corresponding to a test data point. The $(i,j)$-th element of the matrix is $f*(x)+binaryOffset$ for the $i$-th kept draw of $f$ and the $j$-th test data point. Burn-in posterior samples are dropped.

prob.test

A matrix with the same structure as yhat.test and the $(i,j)$-th element of prob.test is $\Phi[f*(x)+binaryOffset]$ for the $i$-th kept draw of $f$ and the $j$-th test data point.

prob.test.mean

A vector which is colMeans(prob.test).

varcount

A matrix with ndpost rows and ncol(x.train) columns with each row corresponding to a draw of the ensemble and each column corresponding to a predictor. The $(i,j)$-th element is the number of times that the $j$- th predictor is used as a split variable in the $i$-th posterior sample.

varprob

A matrix with ndpost rows and ncol(x.train) columns with each row corresponding to a draw of the ensemble and each column corresponding to a predictor. The $(i,j)$-th element is the split probability of the $j$- th predictor in the $i$-th posterior sample. Only useful when DART is fit, i.e., sparse=TRUE.

treedraws

A list containing the posterior samples of the ensembles (trees structures, split variables and split values); Can be used for prediction.

proc.time

The process time of running the function wbart().

vip

A vector of variable inclusion proportions (VIP) proposed in Chipman et al. (2010).

within.type.vip

A vector of within-type VIPs proposed in Luo and Daniels (2021).

pvip

A vector of marginal posterior variable inclusion probabilities (PVIP) proposed in Linero (2018); Only useful when DART is fit, i.e., sparse=TRUE.

varprob.mean

A vector of posterior split probabilities (PSP) proposed in Linero (2018); Only useful when DART is fit, i.e., sparse=TRUE.

mr.vecs

A list of $ncol(x.train)$ sub-lists with each corresponding to a predictor; Each sub-list contains ndpost vectors with each vector containing the (birth) Metropolis ratios for splits using the predictor as the split variable in that posterior sample.

mr.mean

A matrix with ndpost rows and ncol(x.train) columns with each row corresponding to a draw of the ensemble and each column corresponding to a predictor. The $(i,j)$-th element is the average Metropolis acceptance ratio per splitting rule using the $j$-th predictor in the $i$-th posterior sample.

A vector of Metropolis importance (MI) proposed in Luo and Daniels (2021).

rm.const

A vector of indicators for the predictors (after dummification) used in BART; when the indicator is negative, it refers to remove that predictor.

binaryOffset

The binary offset term used in BART.

ndpost

The number of posterior samples returned.

Details

This function is inherited from BART::mc.pbart() and is a variant of the function pbart() with parallel computation. While the original features of BART::pbart() are preserved, two modifications are made. The first modification is to provide two types of split probability for BART. One split probability is proposed in Chipman et al. (2010) and defined as $$p(d) = \gamma * (1+d)^{-\beta},$$ where $d$ is the depth of the node, $\gamma \in (0,1)$ and $\beta \in (0,\infty)$. The other split probability is proposed by Rockova and Saha (2019) and defined as $$p(d) = \gamma^d,$$ where $\gamma \in (1/n, 1/2)$. BART with the second split probability is proved to achieve the optimal posterior contraction. The second modification is to provide five types of variable importance measures (vip, within.type.vip, pvip, varprob.mean and mi) in the return object, for the sake of the existence of mixed-type predictors.

References

Chipman, H. A., George, E. I. and McCulloch, R. E. (2010). "BART: Bayesian additive regression trees." Ann. Appl. Stat. 4 266--298.

Linero, A. R. (2018). "Bayesian regression trees for high-dimensional prediction and variable selection." J. Amer. Statist. Assoc. 113 626--636.

Luo, C. and Daniels, M. J. (2021) "Variable Selection Using Bayesian Additive Regression Trees." arXiv preprint arXiv:2112.13998.

Rockova V, Saha E (2019). <U+201C>On theory for BART.<U+201D> In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 2839<U+2013>2848). PMLR.

Sparapani, R., Spanbauer, C. and McCulloch, R. (2021). "Nonparametric machine learning and efficient computation with bayesian additive regression trees: the BART R package." J. Stat. Softw. 97 1--66.

Examples

Run this code

# NOT RUN {
 
## simulate data (Scenario B.M.1. in Luo and Daniels (2021))
set.seed(123)
data = mixone(100, 10, 1, TRUE)
## parallel::mcparallel/mccollect do not exist on windows
if(.Platform$OS.type=='unix') {
## test mc.pbart() function
  res = mc.pbart(data$X, data$Y, ntree=10, nskip=100, ndpost=100, mc.cores=2)
}
# }

Run the code above in your browser using DataLab