Learn R Programming

vip (version 0.1.3)

vi_pdp: PDP-Based Variable Importance

Description

Compute PDP-based VIscores for the predictors in a model. See details below.

Usage

vi_pdp(object, ...)

# S3 method for default vi_pdp(object, feature_names, FUN = NULL, var_fun = NULL, ...)

Arguments

object

A fitted model object (e.g., a "randomForest" object).

...

Additional optional arguments to be passed onto partial.

feature_names

Character string giving the names of the predictor variables (i.e., features) of interest.

FUN

Deprecated. Use var_fun instead.

var_fun

List with two components, "cat" and "con", containing the functions to use to quantify the variability of the feature effects (e.g., partial dependence values) for categorical and continuous features, respectively. If NULL, the standard deviation is used for continuous features. For categorical features, the range statistic is used (i.e., (max - min) / 4). Only used when method = "pdp" or method = "ice".

Value

A tidy data frame (i.e., a "tibble" object) with two columns, Variable and Importance, containing the variable name and its associated importance score, respectively.

Details

This approach to computing VI scores is based on quantifying the relative "flatness" of the partial dependence plot (PDP) of each feature. It is model-agnostic and can be applied to any supervised learning algorithm. By default, relative "flatness" is defined by computing the standard deviation of the y-axis values for each PDP for numeric features; for categorical features, the default is to use range divided by 4. This can be changed via the `var_fun` argument. See Greenwell et al. (2018) for details and additional examples.

#' @references Greenwell, B. M., Boehmke, B. C., and McCarthy, A. J. A Simple and Effective Model-Based Variable Importance Measure. arXiv preprint arXiv:1805.04755 (2018). #' @rdname vi_pdp