Learn R Programming

dbarts (version 0.9-33)

xbart: Crossvalidation For Bayesian Additive Regression Trees

Description

Fits the BART model against varying k, power, base, and ntree parameters using \(K\)-fold or repeated random subsampling crossvalidation, sharing burn-in between parameter settings. Results are given an array of evalulations of a loss functions on the held-out sets.

Usage

xbart(
    formula, data, subset, weights, offset, verbose = FALSE, n.samples = 200L,
    method = c("k-fold", "random subsample"), n.test = c(5, 0.2),
    n.reps = 40L, n.burn = c(200L, 150L, 50L),
    loss = c("rmse", "log", "mcr"), n.threads = dbarts::guessNumCores(), n.trees = 75L,
    k = NULL, power = 2, base = 0.95, drop = TRUE,
    resid.prior = chisq, control = dbarts::dbartsControl(), sigma = NA_real_,
    seed = NA_integer_)

Arguments

Value

An array of dimensions n.reps

\(\times\)

length(n.trees)

\(\times\)

length(k)

\(\times\)

length(power)

\(\times\)

length(base). If drop is TRUE, dimensions of length 1 are omitted. If all hyperparameters are of length 1, then the result will be a vector of length n.reps. When the result is an array, the dimnames of the result shall be set to the corresponding hyperparameters.

For method "k-fold", each element is an average across the \(K\) fits. For "random subsample", each element represents a single fit.

Details

Crossvalidates n.reps replications against the crossproduct of given hyperparameter vectors n.trees \(\times\) k \(\times\) power \(\times\) base. For each fit, either one fold is withheld as test data and n.test - 1 folds are used as training data or n * n.test observations are withheld as test data and n * (1 - n.test) used as training. A replication corresponds to fitting all \(K\) folds in "k-fold" crossvalidation or a single fit with "random subsample". The training data is used to fit a model and make predictions on the test data which are used together with the test data itself to evaluate the loss function.

loss functions are either the default of average negative log-loss for binary outcomes and root-mean-squared error for continuous outcomes, missclassification rates for binary outcomes, or a function with arguments y.test and y.test.hat. y.test.hat is of dimensions equal to length(y.test) \(\times\) n.samples. A third option is to pass a list of list(function, evaluationEnvironment), so as to provide default bindings. RMSE is a monotonic transformation of the average log-loss for continuous outcomes, so specifying log-loss in that case calculates RMSE instead.

See Also

bart, dbarts

Examples

Run this code
f <- function(x) {
    10 * sin(pi * x[,1] * x[,2]) + 20 * (x[,3] - 0.5)^2 +
        10 * x[,4] + 5 * x[,5]
}

set.seed(99)
sigma <- 1.0
n     <- 100

x  <- matrix(runif(n * 10), n, 10)
Ey <- f(x)
y  <- rnorm(n, Ey, sigma)

mad <- function(y.train, y.train.hat, weights) {
    # note, weights are ignored
    mean(abs(y.train - apply(y.train.hat, 1L, mean)))
}



## low iteration numbers to to run quickly
xval <- xbart(x, y, n.samples = 15L, n.reps = 4L, n.burn = c(10L, 3L, 1L),
              n.trees = c(5L, 7L),
              k = c(1, 2, 4),
              power = c(1.5, 2),
              base = c(0.75, 0.8, 0.95), n.threads = 1L,
              loss = mad)

Run the code above in your browser using DataLab