bartMachine(X = NULL, y = NULL, Xy = NULL,
num_trees = 50,
num_burn_in = 250,
num_iterations_after_burn_in = 1000,
alpha = 0.95, beta = 2, k = 2, q = 0.9, nu = 3,
prob_rule_class = 0.5,
mh_prob_steps = c(2.5, 2.5, 4)/9,
debug_log = FALSE,
run_in_sample = TRUE,
s_sq_y = "mse",
sig_sq_est = NULL,
cov_prior_vec = NULL,
use_missing_data = FALSE,
covariates_to_permute = NULL,
num_rand_samps_in_library = 10000,
use_missing_data_dummies_as_covars = FALSE,
replace_missing_data_with_x_j_bar = FALSE,
impute_missingness_with_rf_impute = FALSE,
impute_missingness_with_x_j_bar_for_lm = TRUE,
mem_cache_for_speed = TRUE,
serialize = FALSE,
seed = NULL,
verbose = TRUE)
build_bart_machine(X = NULL, y = NULL, Xy = NULL,
num_trees = 50,
num_burn_in = 250,
num_iterations_after_burn_in = 1000,
alpha = 0.95, beta = 2, k = 2, q = 0.9, nu = 3,
prob_rule_class = 0.5,
mh_prob_steps = c(2.5, 2.5, 4)/9,
debug_log = FALSE,
run_in_sample = TRUE,
s_sq_y = "mse",
sig_sq_est = NULL,
cov_prior_vec = NULL,
use_missing_data = FALSE,
covariates_to_permute = NULL,
num_rand_samps_in_library = 10000,
use_missing_data_dummies_as_covars = FALSE,
replace_missing_data_with_x_j_bar = FALSE,
impute_missingness_with_rf_impute = FALSE,
impute_missingness_with_x_j_bar_for_lm = TRUE,
mem_cache_for_speed = TRUE,
serialize = FALSE,
seed = NULL,
verbose = TRUE)
y
is numeric
or integer
, a BART model for regression is built. If y
is a factor with two levels, a BART model for classification is built.
k
determines the prior probability that $E(Y|X)$ is contained in the interval $(y_{min}, y_{max})$, based on a normal distribution. For example, when $k=2$, the prior probability is 95%. For classification, k
determines the prior probability that $E(Y|X)$ is between $(-3,3)$. Note that a larger value of k
results in more shrinkage and a more conservative fit.
q
, the more aggressive the fit as you are placing more prior weight on values lower than the data-based estimate. Not used for classification.
prob_class_rule
is assigned the ``positive'' outcome. Note that the first level of the response is treated as the ``negative'' outcome and the second is treated as the ``positive'' outcome.
dummify_data
. See Bleich et al. (2013) for more details on when this feature is most appropriate.
cov_importance_test
. Not needed by user.
X
are imputed with average value or modal category.randomForest
library.
TRUE
will allow serialization of bartMachine objects which allows for persistence between
R sessions if the object is saved and reloaded. Note that serialized objects can take up a large amount of memory.
Thus, the default is FALSE
.
NULL
which does not set the seed in R nor Java.
use_missing_data_dummies_as_covars = TRUE
, this also includes dummies for any predictors that contain at least one missing entry (named ``M_run_in_sample = TRUE
.y
- y_hat_train
. Only returned if run_in_sample = TRUE
.run_in_sample = TRUE
.run_in_sample = TRUE
.run_in_sample = TRUE
.run_in_sample = TRUE
.bartMachine
are also components of the list.
A Kapelner and J Bleich. Prediction with Missing Data via Bayesian Additive Regression Trees. Canadian Journal of Statistics, 43(2): 224-239, 2015
J Bleich, A Kapelner, ST Jensen, and EI George. Variable Selection Inference for Bayesian Additive Regression Trees. ArXiv e-prints, 2013.
bartMachineCV
##regression example
##generate Friedman data
set.seed(11)
n = 200
p = 5
X = data.frame(matrix(runif(n * p), ncol = p))
y = 10 * sin(pi* X[ ,1] * X[,2]) +20 * (X[,3] -.5)^2 + 10 * X[ ,4] + 5 * X[,5] + rnorm(n)
##build BART regression model
bart_machine = bartMachine(X, y)
summary(bart_machine)
## Not run:
# ##Build another BART regression model
# bart_machine = bartMachine(X,y, num_trees = 200, num_burn_in = 500,
# num_iterations_after_burn_in = 1000)
#
# ##Classification example
#
# #get data and only use 2 factors
# data(iris)
# iris2 = iris[51:150,]
# iris2$Species = factor(iris2$Species)
#
# #build BART classification model
# bart_machine = build_bart_machine(iris2[ ,1:4], iris2$Species)
#
# ##get estimated probabilities
# phat = bart_machine$p_hat_train
# ##look at in-sample confusion matrix
# bart_machine$confusion_matrix
# ## End(Not run)
Run the code above in your browser using DataLab