Creates a sampler object for a given problem which fits a Bayesian Additive Regreesion Trees model. Internally stores state in such a way as to be mutable.
dbarts(
formula, data, test, subset, weights, offset, offset.test = offset,
verbose = FALSE, n.samples = 800L,
tree.prior = cgm, node.prior = normal, resid.prior = chisq,
proposal.probs = c(
birth_death = 0.5, swap = 0.1, change = 0.4, birth = 0.5),
control = dbarts::dbartsControl(), sigma = NA_real_)
A reference object of dbartsSampler
.
An object of class formula
following an analogous model description syntax as lm
. For backwards compatibility, can also be the bart
matrix x.train
.
An optional data frame, list, or environment containing predictors to be used with the model. For backwards compatibility, can also be the bart
vector y.train
.
An optional matrix or data frame with the same number of predictors as data
, or formula
in backwards compatibility mode. If column names are present, a matching algorithm is used.
An optional vector specifying a subset of observations to be used in the fitting process.
An optional vector of weights to be used in the fitting process. When present, BART fits a model with observations \(y \mid x \sim N(f(x), \sigma^2 / w)\), where \(f(x)\) is the unknown function.
An optional vector specifying an offset from 0 for the relationship between the underyling function, \(f(x)\), and the response \(y\). Only is useful for binary responses, in which case the model fit is to assume \(P(Y = 1 \mid X = x) = \Phi(f(x) + \mathrm{offset})\), where \(\Phi\) is the standard normal cumulative distribution function.
The equivalent of offset
for test observations. Will attempt to use offset
when applicable.
A logical determining if additional output is printed to the console. See dbartsControl
.
A positive integer setting the default number of posterior samples to be returned for each run of the sampler. Can be overriden at run-time. See dbartsControl
.
An expression of the form cgm
or cgm(power, base, split.probs)
setting the tree prior used in fitting.
An expression of the form normal
or normal(k)
that sets the prior used on the averages within nodes.
An expression of the form chisq
or chisq(df, quant)
that sets the prior used on the residual/error variance.
Named numeric vector or NULL
, optionally specifying the proposal rules and their probabilities. Elements should be "birth_death"
, "change"
, and "swap"
to control tree change proposals, and "birth"
to give the relative frequency of birth/death in the "birth_death"
step.
An object inheriting from dbartsControl
, created by the dbartsControl
function.
A positive numeric estimate of the residual standard deviation. If NA
, a linear model is used with all of the predictors to obtain one.
“Discrete sampler” refers to that dbarts
is implemented using ReferenceClasses, so that there exists a mutable object constructed in C++ that is largely obscured from R. The dbarts
function is the primary way of creating a dbartsSampler
, for which a variety of methods exist.