Two algorithms to calculate variable importance are available: (a) Permutation importance and (b) SHAP importance. Algorithm (a) measures importance of variable v as the drop in performance by permuting the values of v, see Fisher et al. 2018 (reference below). Algorithm (b) measures variable importance by averaging absolute SHAP values.
light_importance(x, ...)# S3 method for default
light_importance(x, ...)
# S3 method for flashlight
light_importance(
x,
data = x$data,
by = x$by,
type = c("permutation", "shap"),
v = NULL,
n_max = Inf,
seed = NULL,
m_repetitions = 1,
metric = x$metrics[1],
lower_is_better = TRUE,
use_linkinv = FALSE,
metric_name = "metric",
value_name = "value",
error_name = "error",
label_name = "label",
variable_name = "variable",
...
)
# S3 method for multiflashlight
light_importance(x, ...)
An object of class flashlight
or multiflashlight
.
Further arguments passed to light_performance
. Not used for type = "shap"
.
An optional data.frame
. Not used for type = "shap"
.
An optional vector of column names used to additionally group the results.
Type of importance: "permutation" (default) or "shap". "shap" is only available if a "shap" object is contained in x
.
Vector of variables to assess importance for. Defaults to all variables in data
except "by" and "y".
Maximum number of rows to consider. Not used for type = "shap"
.
An integer random seed used to select and shuffle rows. Not used for type = "shap"
.
Number of permutations. Defaults to 1. A value above 1 provides more stable estimates of variable importance and allows the calculation of standard errors measuring the uncertainty from permuting. Not used for type = "shap"
.
An optional named list of length one with a metric as element. Defaults to the first metric in the flashlight. The metric needs to be a function with at least four arguments: actual, predicted, case weights w and ...
. Irrelevant for type = "shap"
.
Logical flag indicating if lower values in the metric are better or not. If set to FALSE, the increase in metric is multiplied by -1. Not used for type = "shap"
.
Should retransformation function be applied? Default is FALSE. Not uses for type = "shap"
.
Name of the resulting column containing the name of the metric. Defaults to "metric". Irrelevant for type = "shap"
.
Column name in resulting data
containing the variable importance. Defaults to "value".
Column name in resulting data
containing the standard error of permutation importance. Defaults to "error".
Column name in resulting data
containing the label of the flashlight. Defaults to "label".
Column name in resulting data
containing the variable names. Defaults to "variable".
An object of class light_importance
, light
(and a list) with the following elements.
data
A tibble with results. Can be used to build fully customized visualizations.
by
Same as input by
.
type
Same as input type
. For information only.
metric_name
Column name representing the name of the metric. For information only.
value_name
Same as input value_name
.
error_name
Same as input error_name
.
label_name
Same as input label_name
.
variable_name
Same as input variable_name
.
default
: Default method not implemented yet.
flashlight
: Variable importance for a flashlight.
multiflashlight
: Variable importance for a multiflashlight.
For algorithm (a), the minimum required elements in the (multi-) flashlight are "y", "predict_function", "model", "data" and "metrics". For algorithm (b), the only required element is "shap". Call add_shap
once to add such object.
Note: The values of the permutation algorithm (a) are on the scale of the selected metric. For shap algorithm (b), the values are on the scale of absolute values of the predictions.
Fisher A., Rudin C., Dominici F. (2018). All Models are Wrong but many are Useful: Variable Importance for Black-Box, Proprietary, or Misspecified Prediction Models, using Model Class Reliance. Arxiv.
# NOT RUN {
fit <- lm(Sepal.Length ~ Petal.Length, data = iris)
fl <- flashlight(model = fit, label = "full", data = iris, y = "Sepal.Length")
light_importance(fl)
# }
Run the code above in your browser using DataLab