vi_permute: Permutation-Based Variable Importance

Description

Compute permutation-based variable importance scores for the predictors in a model. (This function is meant for internal use only.)

Usage

vi_permute(object, ...)
# S3 method for default
vi_permute(object, train, target, metric = "auto",
  smaller_is_better = NULL, reference_class = NULL, pred_fun = NULL,
  verbose = FALSE, progress = "none", parallel = FALSE,
  paropts = NULL, ...)

Arguments

object

A fitted model object (e.g., a "randomForest" object).

...

Additional optional arguments. (Currently ignored.)

train

A matrix-like R object (e.g., a data frame or matrix) containing the training data.

target

Either a character string giving the name (or position) of the target column in train or, if train only contains feature columns, a vector containing the target values used to train object.

metric

Either a function or character string specifying the performance metric to use in computing model performance (e.g., RMSE for regression or accuracy for binary classification). If metric is a function, then it requires two arguments, actual and predicted, and should return a single, numeric value. Ideally, this should be the same metric that was to train object.

smaller_is_better

Logical indicating whether or not a smaller value of metric is better. Default is NULL. Must be supplied if metric is a user-supplied function.

reference_class

Character string specifying which response category represents the "reference" class (i.e., the class for which the predicted class probabilities correspond to). Only needed for binary classification problems.

pred_fun

Optional prediction function that requires two arguments, object and newdata. Default is NULL. Must be supplied whenever metric is a custom function.

verbose

Logical indicating whether or not to print information during the construction of variable importance scores. Default is FALSE.

progress

Character string giving the name of the progress bar to use. See create_progress_bar for details. Default is "none".

parallel

Logical indicating whether or not to run vi_permute() in parallel (using a backend provided by the foreach package). Default is FALSE. If TRUE, an appropriate backend must be provided by foreach.

paropts

List containing additional options to be passed onto foreach when parallel = TRUE.

Value

A tidy data frame (i.e., a "tibble" object) with two columns: Variable and Importance. For "glm"-like object, an additional column, called Sign, is also included which gives the sign (i.e., POS/NEG) of the original coefficient.

Details

Coming soon!

Examples

Run this code

# NOT RUN {
# Load required packages
library(ggplot2)  # for ggtitle() function
library(mlbench)  # for ML benchmark data sets
library(nnet)     # for fitting neural networks

# Simulate training data
set.seed(101)  # for reproducibility
trn <- as.data.frame(mlbench.friedman1(500))  # ?mlbench.friedman1

# Inspect data
tibble::as.tibble(trn)

# Fit PPR and NN models (hyperparameters were chosen using the caret package
# with 5 repeats of 5-fold cross-validation)
pp <- ppr(y ~ ., data = trn, nterms = 11)
set.seed(0803) # for reproducibility
nn <- nnet(y ~ ., data = trn, size = 7, decay = 0.1, linout = TRUE,
           maxit = 500)

# Plot VI scores
set.seed(2021)  # for reproducibility
p1 <- vip(pp, method = "permute", target = "y", metric = "rsquared",
          pred_fun = predict) + ggtitle("PPR")
p2 <- vip(nn, method = "permute", target = "y", metric = "rsquared",
          pred_fun = predict) + ggtitle("NN")
grid.arrange(p1, p2, ncol = 2)

# Mean absolute error
mae <- function(actual, predicted) {
  mean(abs(actual - predicted))
}

# Permutation-based VIP with user-defined MAE metric
set.seed(1101)  # for reproducibility
vip(pp, method = "permute", target = "y", metric = mae,
    smaller_is_better = TRUE,
    pred_fun = function(object, newdata) predict(object, newdata)  # wrapper
) + ggtitle("PPR")
# }

Run the code above in your browser using DataLab