bartMachine (version 1.2.6)

var_selection_by_permute: Perform Variable Selection using Three Threshold-based Procedures

Description

Performs variable selection using the three thresholding methods introduced in Bleich et al. (2013).

Usage

var_selection_by_permute(bart_machine, 
num_reps_for_avg = 10, num_permute_samples = 100, 
num_trees_for_permute = 20, alpha = 0.05, 
plot = TRUE, num_var_plot = Inf, bottom_margin = 10)

Value

Invisibly, returns a list with the following components:

important_vars_local_names

Names of the variables chosen by the Local procedure.

important_vars_global_max_names

Names of the variables chosen by the Global Max procedure.

important_vars_global_se_names

Names of the variables chosen by the Global SE procedure.

important_vars_local_col_nums

Column numbers of the variables chosen by the Local procedure.

important_vars_global_max_col_nums

Column numbers of the variables chosen by the Global Max procedure.

important_vars_global_se_col_nums

Column numbers of the variables chosen by the Global SE procedure.

var_true_props_avg

The variable inclusion proportions for the actual data.

permute_mat

The permutation distribution generated by permuting the response vector.

Arguments

bart_machine

An object of class ``bartMachine''.

num_reps_for_avg

Number of replicates to over over to for the BART model's variable inclusion proportions.

num_permute_samples

Number of permutations of the response to be made to generate the ``null'' permutation distribution.

num_trees_for_permute

Number of trees to use in the variable selection procedure. As with
investigate_var_importance, a small number of trees should be used to force variables to compete for entry into the model. Note that this number is used to estimate both the ``true'' and ``null'' variable inclusion proportions.

alpha

Cut-off level for the thresholds.

plot

If TRUE, a plot showing which variables are selected by each of the procedures is generated.

num_var_plot

Number of variables (in order of decreasing variable inclusion proportion) to be plotted.

bottom_margin

A display parameter that adjusts the bottom margin of the graph if labels are clipped. The scale of this parameter is the same as set with par(mar = c(....)) in R. Higher values allow for more space if the crossed covariate names are long. Note that making this parameter too large will prevent plotting and the plot function in R will throw an error.

Author

Adam Kapelner and Justin Bleich

Details

See Bleich et al. (2013) for a complete description of the procedures outlined above as well as the corresponding vignette for a brief summary with examples.

References

J Bleich, A Kapelner, ST Jensen, and EI George. Variable Selection Inference for Bayesian Additive Regression Trees. ArXiv e-prints, 2013.

Adam Kapelner, Justin Bleich (2016). bartMachine: Machine Learning with Bayesian Additive Regression Trees. Journal of Statistical Software, 70(4), 1-40. doi:10.18637/jss.v070.i04

See Also

var_selection_by_permute, investigate_var_importance

Examples

Run this code
if (FALSE) {
#generate Friedman data
set.seed(11)
n  = 300 
p = 20 ##15 useless predictors 
X = data.frame(matrix(runif(n * p), ncol = p))
y = 10 * sin(pi* X[ ,1] * X[,2]) +20 * (X[,3] -.5)^2 + 10 * X[ ,4] + 5 * X[,5] + rnorm(n)

##build BART regression model (not actuall used in variable selection)
bart_machine = bartMachine(X, y)

#variable selection
var_sel = var_selection_by_permute(bart_machine)
print(var_sel$important_vars_local_names)
print(var_sel$important_vars_global_max_names)
}
  

Run the code above in your browser using DataCamp Workspace