Fits the BART model against varying <code>k</code>, <code>power</code>, <code>base</code>, and <code>ntree</code> parameters using \(K\)-fold or repeated random subsampling crossvalidation, sharing burn-in between parameter settings. Results are given an array of evalulations of a loss functions on the held-out sets.

xbart

Fits Bayesian additive regression trees (BART; Chipman, George, and McCulloch (2010) <doi:10.1214/09-AOAS285>) while allowing the updating of predictors or response so that BART can be incorporated as a conditional model in a Gibbs/Metropolis-Hastings sampler. Also serves as a drop-in replacement for package 'BayesTree'.

Vincent Dorie Developer

dbarts

Discrete Bayesian Additive Regression Trees Sampler

Vincent Dorie

Hugh Chipman

Robert McCulloch

Armon Dadgar

R Core Team 

Guido U Draheim

Maarten Bosmans

Christophe Tournayre

Michael Petch

Rafael de Lucena Valle

Steven G. Johnson

Matteo Frigo

John Zaitseff

Todd Veldhuizen

Luc Maisonobe

Scott Pakin

Daniel Richard G.

xbart function

Crossvalidation For Bayesian Additive Regression Trees — xbart

<dl>

 <dt>formula</dt>
<dd>An object of class <code><a href='https://rdrr.io/r/stats/formula.html'>formula</a></code> following an analogous model description syntax as <code><a href='https://rdrr.io/r/stats/lm.html'>lm</a></code>. For backwards compatibility, can also be the <code>bart</code> matrix <code>x.train</code>. See <code>dbarts</code>.</dd>

 <dt>data</dt>
<dd>An optional data frame, list, or environment containing predictors to be used with the model. For backwards compatibility, can also be the <code>bart</code> vector <code>y.train</code>.</dd>

 <dt>subset</dt>
<dd>An optional vector specifying a subset of observations to be used in the fitting process.</dd>

 <dt>weights</dt>
<dd>An optional vector of weights to be used in the fitting process. When present, BART fits a model with observations \(y \mid x \sim N(f(x), \sigma^2 / w)\), where \(f(x)\) is the unknown function.</dd>

 <dt>offset</dt>
<dd>An optional vector specifying an offset from 0 for the relationship between the underyling function, \(f(x)\), and the response \(y\). Only is useful for binary responses, in which case the model fit is to assume \(P(Y = 1 \mid X = x) = \Phi(f(x) + \mathrm{offset})\), where \(\Phi\) is the standard normal cumulative distribution function.</dd>

 <dt>verbose</dt>
<dd>A logical determining if additional output is printed to the console.</dd>

 <dt>n.samples</dt>
<dd>A positive integer, setting the number of posterior samples drawn for each fit of training data and used by the loss function.</dd>

 <dt>method</dt>
<dd>Character string, either <code>"k-fold"</code> or <code>"random subsample"</code>.</dd>

 <dt>n.test</dt>
<dd>For each fit, the test sample size or proportion. For method <code>"k-fold"</code>, is expected to be the number of folds, and in \([2, n]\). For method <code>"random subsample"</code>, can be a real number in \((0, 1)\) or a positive integer in \((1, n)\). When a given as proportion, the number of test observations used is the proportion times the sample size rounded to the nearest integer.</dd>

 <dt>n.reps</dt>
<dd>A positive integer setting the number of cross validation steps that will be taken. For <code>"k-fold"</code>, each replication corresponds to fitting each of the \(K\) folds in turn, while for <code>"random subsample"</code> a replication is a single fit.</dd>

 <dt>n.burn</dt>
<dd>Between one and three positive integers, specifying the 1) initial burn-in, 2) burn-in when moving from one parameter setting to another, and 3) the burn-in between each random subsample replication. The third parameter is also the burn in when moving between folds in <code>"k-fold"</code> crossvalidation.</dd>

 <dt>loss</dt>
<dd>Either a one of the pre-set loss functions as character-strings (<code>mcr</code> - missclassification rate for binary responses, <code>rmse</code> - root-mean-squared-error for continuous response), <code>log</code> - negative log-loss for binary response (<code>rmse</code> serves this purpose for continuous responses), a function, or a function-evaluation environment list-pair. Functions should have prototypes of the form <code>function(y.test, y.test.hat, weights)</code>, where <code>y.test</code> is the held out test subsample, <code>y.test.hat</code> is a matrix of dimension <code>length(y.test) * n.samples</code>, and <code>weights</code> are an optional vector of user-supplied weights. See examples.</dd>

 <dt>n.threads</dt>
<dd>Across different sets of parameters (<code>k</code> \(\times\) <code>power</code> \(\times\) <code>base</code> \(\times\) <code>n.trees</code>) and <code>n.reps</code>, results are independent. For <code>n.threads &gt; 1</code>, evaluations of the above are divided into approximately equal size evaluations chunks and executed in parallel. The default uses <code>link{guessNumCores}</code>, which should work across the most common operating system/hardware pairs. A value of <code>NA</code> is interpretted as 1.</dd>

 <dt>n.trees</dt>
<dd>A vector of positive integers setting the BART hyperparameter for the number of trees in the sum-of-trees formulation. See <code>bart</code>.</dd>

 <dt>k</dt>
<dd>A vector of positive real numbers, setting the BART hyperparameter for the node-mean prior standard deviation. If <code>NULL</code>, the default of <code>bart2</code> will be used - 2 for continuous response and a Chi hyperprior for binary. Hyperprior crossvalidation not possible at this time.</dd>

 <dt>power</dt>
<dd>A vector of real numbers greater than one, setting the BART hyperparameter for the tree prior's growth probability, given by \({base} / (1 + depth)^{power}\).</dd>

 <dt>base</dt>
<dd>A vector of real numbers in \((0, 1)\), setting the BART hyperparameter for the tree prior's growth probability.</dd>

 <dt>drop</dt>
<dd>Logical, determining if dimensions with a single value are dropped from the result.</dd>

 <dt>resid.prior</dt>
<dd>An expression of the form <code>chisq</code> or <code>chisq(df, quant)</code> that sets the prior used on the residual/error variance.</dd>

 <dt>control</dt>
<dd>An object inheriting from <code>dbartsControl</code>, created by the <code>dbartsControl</code> function.</dd>

 <dt>sigma</dt>
<dd>A positive numeric estimate of the residual standard deviation. If <code>NA</code>, a linear model is used with all of the predictors to obtain one.</dd>

 <dt>seed</dt>
<dd>Optional integer specifying the desired pRNG <a href='https://rdrr.io/r/base/Random.html'>seed</a>. It should not be needed when running single-threaded - <code><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></code> will suffice, and can be used to obtain reproducible results when multi-threaded. See Reproducibility section of <code>bart</code>.</dd>

</dl>

Arguments

Vincent Dorie: <a href='mailto:vdorie@gmail.com'>vdorie@gmail.com</a>

Author

Crossvalidation For Bayesian Additive Regression Trees

xbart: Crossvalidation For Bayesian Additive Regression Trees

Description

Usage

Arguments

Value

Details

See Also

Examples