Performs variable selection using the three thresholding methods introduced in Bleich et al. (2013).

```
var_selection_by_permute(bart_machine,
num_reps_for_avg = 10, num_permute_samples = 100,
num_trees_for_permute = 20, alpha = 0.05,
plot = TRUE, num_var_plot = Inf, bottom_margin = 10)
```

bart_machine

An object of class ``bartMachine''.

num_reps_for_avg

Number of replicates to over over to for the BART model's variable inclusion proportions.

num_permute_samples

Number of permutations of the response to be made to generate the ``null'' permutation distribution.

num_trees_for_permute

Number of trees to use in the variable selection procedure. As with `investigate_var_importance`

, a small number of trees should be used to force variables to compete for entry into the model. Note that this number is used to estimate both the ``true'' and ``null'' variable inclusion proportions.

alpha

Cut-off level for the thresholds.

plot

If TRUE, a plot showing which variables are selected by each of the procedures is generated.

num_var_plot

Number of variables (in order of decreasing variable inclusion proportion) to be plotted.

bottom_margin

A display parameter that adjusts the bottom margin of the graph if labels are clipped. The scale of this parameter is the same as set with `par(mar = c(....))`

in R.
Higher values allow for more space if the crossed covariate names are long. Note that making this parameter too large will prevent plotting and the plot function in R will throw an error.

Invisibly, returns a list with the following components:

Names of the variables chosen by the Local procedure.

Names of the variables chosen by the Global Max procedure.

Names of the variables chosen by the Global SE procedure.

Column numbers of the variables chosen by the Local procedure.

Column numbers of the variables chosen by the Global Max procedure.

Column numbers of the variables chosen by the Global SE procedure.

The variable inclusion proportions for the actual data.

The permutation distribution generated by permuting the response vector.

See Bleich et al. (2013) for a complete description of the procedures outlined above as well as the corresponding vignette for a brief summary with examples.

J Bleich, A Kapelner, ST Jensen, and EI George. Variable Selection Inference for Bayesian Additive Regression Trees. ArXiv e-prints, 2013.

Adam Kapelner, Justin Bleich (2016). bartMachine: Machine Learning with Bayesian Additive Regression Trees. Journal of Statistical Software, 70(4), 1-40. doi:10.18637/jss.v070.i04

# NOT RUN { #generate Friedman data set.seed(11) n = 300 p = 20 ##15 useless predictors X = data.frame(matrix(runif(n * p), ncol = p)) y = 10 * sin(pi* X[ ,1] * X[,2]) +20 * (X[,3] -.5)^2 + 10 * X[ ,4] + 5 * X[,5] + rnorm(n) ##build BART regression model (not actuall used in variable selection) bart_machine = bartMachine(X, y) #variable selection var_sel = var_selection_by_permute(bart_machine) print(var_sel$important_vars_local_names) print(var_sel$important_vars_global_max_names) # } # NOT RUN { # }