Compute model-specific variable importance scores for the predictors in a model.
vi_model(object, ...)# S3 method for default
vi_model(object, ...)
# S3 method for C5.0
vi_model(object, type = c("usage", "splits"), ...)
# S3 method for train
vi_model(object, ...)
# S3 method for cubist
vi_model(object, ...)
# S3 method for earth
vi_model(object, type = c("nsubsets", "rss", "gcv"), ...)
# S3 method for gbm
vi_model(object, type = c("relative.influence", "permutation"), ...)
# S3 method for glmnet
vi_model(object, lambda = NULL, ...)
# S3 method for cv.glmnet
vi_model(object, lambda = NULL, ...)
# S3 method for H2OBinomialModel
vi_model(object, ...)
# S3 method for H2OMultinomialModel
vi_model(object, ...)
# S3 method for H2ORegressionModel
vi_model(object, ...)
# S3 method for WrappedModel
vi_model(object, ...)
# S3 method for Learner
vi_model(object, ...)
# S3 method for nn
vi_model(object, type = c("olden", "garson"), ...)
# S3 method for nnet
vi_model(object, type = c("olden", "garson"), ...)
# S3 method for model_fit
vi_model(object, ...)
# S3 method for RandomForest
vi_model(object, type = c("accuracy", "auc"), ...)
# S3 method for constparty
vi_model(object, ...)
# S3 method for cforest
vi_model(object, ...)
# S3 method for mvr
vi_model(object, ...)
# S3 method for randomForest
vi_model(object, ...)
# S3 method for ranger
vi_model(object, ...)
# S3 method for rpart
vi_model(object, ...)
# S3 method for mlp
vi_model(object, type = c("olden", "garson"), ...)
# S3 method for ml_model_decision_tree_regression
vi_model(object, ...)
# S3 method for ml_model_decision_tree_classification
vi_model(object, ...)
# S3 method for ml_model_gbt_regression
vi_model(object, ...)
# S3 method for ml_model_gbt_classification
vi_model(object, ...)
# S3 method for ml_model_generalized_linear_regression
vi_model(object, ...)
# S3 method for ml_model_linear_regression
vi_model(object, ...)
# S3 method for ml_model_random_forest_regression
vi_model(object, ...)
# S3 method for ml_model_random_forest_classification
vi_model(object, ...)
# S3 method for lm
vi_model(object, type = c("stat", "raw"), ...)
# S3 method for xgb.Booster
vi_model(object, type = c("gain", "cover", "frequency"), ...)
A tidy data frame (i.e., a "tibble"
object) with two columns:
Variable
and Importance
. For "lm"/"glm"
-like object, an
additional column, called Sign
, is also included which includes the
sign (i.e., POS/NEG) of the original coefficient.
A fitted model object (e.g., a "randomForest"
object).
Additional optional arguments to be passed on to other methods.
Character string specifying the type of variable importance to return (only used for some models). See details for which methods this argument applies to.
Numeric value for the penalty parameter of a
glmnet
model (this is equivalent to the s
argument in coef.glmnet
). See the section on
glmnet
in the details below.
Computes model-specific variable importance scores depending on the class of
object
:
C5.0
Variable importance is measured by determining
the percentage of training set samples that fall into all the terminal nodes
after the split. For example, the predictor in the first split automatically
has an importance measurement of 100 percent since all samples are affected
by this split. Other predictors may be used frequently in splits, but if the
terminal nodes cover only a handful of training set samples, the importance
scores may be close to zero. The same strategy is applied to rule-based
models and boosted versions of the model. The underlying function can also
return the number of times each predictor was involved in a split by using
the option metric = "usage"
. See C5imp
for
details.
cubist
The Cubist output contains variable usage
statistics. It gives the percentage of times where each variable was used in
a condition and/or a linear model. Note that this output will probably be
inconsistent with the rules shown in the output from summary.cubist. At each
split of the tree, Cubist saves a linear model (after feature selection) that
is allowed to have terms for each variable used in the current split or any
split above it. Quinlan (1992) discusses a smoothing algorithm where each
model prediction is a linear combination of the parent and child model along
the tree. As such, the final prediction is a function of all the linear
models from the initial node to the terminal node. The percentages shown in
the Cubist output reflects all the models involved in prediction (as opposed
to the terminal models shown in the output). The variable importance used
here is a linear combination of the usage in the rule conditions and the
model. See summary.cubist
and
varImp.cubist
for details.
glmnet
Similar to (generalized) linear models,
the absolute value of the coefficients are returned for a specific model.
It is important that the features (and hence, the estimated coefficients) be
standardized prior to fitting the model. You can specify which coefficients
to return by passing the specific value of the penalty parameter via the
lambda
argument (this is equivalent to the s
argument in
coef.glmnet
). By default, lambda = NULL
and the coefficients
corresponding to the final penalty value in the sequence are returned; in
other words, you should ALWAYS SPECIFY lambda
! For "cv.glmnet"
objects, the largest value of lambda such that the error is within one standard
error of the minimum is used by default. For "multnet"
objects, the
coefficients corresponding to the first class are used; that is, the first
component of coef.glmnet
.
cforest
Variable importance is measured in a
way similar to those computed by importance
.
Besides the standard version, a conditional version is available that
adjusts for correlations between predictor variables. If
conditional = TRUE
, the importance of each variable is computed by
permuting within a grid defined by the predictors that are associated (with
1 - p-value greater than threshold) to the variable of interest. The
resulting variable importance score is conditional in the sense of beta
coefficients in regression models, but represents the effect of a variable in
both main effects and interactions. See Strobl et al. (2008) for details.
Note, however, that all random forest results are subject to random
variation. Thus, before interpreting the importance ranking, check whether
the same ranking is achieved with a different random seed - or otherwise
increase the number of trees ntree in ctree_control
.
Note that in the presence of missings in the predictor variables the
procedure described in Hapfelmeier et al. (2012) is performed. See
varimp
for details.
earth
The earth
package uses
three criteria for estimating the variable importance in a MARS model (see
evimp
for details):
The nsubsets
criterion (type = "nsubsets"
) counts the
number of model subsets that include each feature. Variables that are
included in more subsets are considered more important. This is the
criterion used by summary.earth
to print variable
importance. By "subsets" we mean the subsets of terms generated by
earth()
's backward pass. There is one subset for each model size
(from one to the size of the selected model) and the subset is the best set
of terms for that model size. (These subsets are specified in the
$prune.terms
component of earth()
's return value.) Only
subsets that are smaller than or equal in size to the final model are used
for estimating variable importance. This is the default method used by
vip.
The rss
criterion (type = "rss"
) first calculates the
decrease in the RSS for each subset relative to the previous subset during
earth()
’s backward pass. (For multiple response models, RSS's are
calculated over all responses.) Then for each variable it sums these
decreases over all subsets that include the variable. Finally, for ease of
interpretation the summed decreases are scaled so the largest summed
decrease is 100. Variables which cause larger net decreases in the RSS are
considered more important.
The gcv
criterion (type = "gcv"
) is similar to the
rss
approach, but uses the GCV statistic instead of the RSS. Note
that adding a variable can sometimes increase the GCV. (Adding the variable
has a deleterious effect on the model, as measured in terms of its
estimated predictive power on unseen data.) If that happens often enough,
the variable can have a negative total importance, and thus appear less
important than unused variables.
gbm
Variable importance is computed using one of
two approaches (See summary.gbm
for details):
The standard approach (type = "relative.influence"
) described
in Friedman (2001). When distribution = "gaussian"
this returns the
reduction of squared error attributable to each variable. For other loss
functions this returns the reduction attributable to each variable in sum
of squared error in predicting the gradient on each iteration. It describes
the relative influence of each variable in reducing the loss
function. This is the default method used by vip.
An experimental permutation-based approach
(type = "permutation"
). This method randomly permutes each predictor
variable at a time and computes the associated reduction in predictive
performance. This is similar to the variable importance measures Leo
Breiman uses for random forests, but gbm currently computes using
the entire training dataset (not the out-of-bag observations).
H2OModel
See h2o.varimp
or visit
http://docs.h2o.ai/h2o/latest-stable/h2o-docs/variable-importance.html
for details.
nnet
Two popular methods for constructing variable
importance scores with neural networks are the Garson algorithm
(Garson 1991), later modified by Goh (1995), and the Olden algorithm
(Olden et al. 2004). For both algorithms, the basis of these importance
scores is the network’s connection weights. The Garson algorithm determines
variable importance by identifying all weighted connections between the nodes
of interest. Olden’s algorithm, on the other hand, uses the product of the
raw connection weights between each input and output neuron and sums the
product across all hidden neurons. This has been shown to outperform the
Garson method in various simulations. For DNNs, a similar method due to
Gedeon (1997) considers the weights connecting the input features to the
first two hidden layers (for simplicity and speed); but this method can be
slow for large networks.. To implement the Olden and Garson algorithms, use
type = "olden"
and type = "garson"
, respectively. See
garson
and olden
for details.
lm
In (generalized) linear models, variable
importance is typically based on the absolute value of the corresponding
t-statistics. For such models, the sign of the original coefficient
is also returned. By default, type = "stat"
is used; however, if the
inputs have been appropriately standardized then the raw coefficients can be
used with type = "raw"
.
ml_feature_importances
The Spark ML
library provides standard variable importance for tree-based methods (e.g.,
random forests). See ml_feature_importances
for
details.
randomForest
Random forests typically
provide two measures of variable importance. The first measure is computed
from permuting out-of-bag (OOB) data: for each tree, the prediction error on
the OOB portion of the data is recorded (error rate for classification and
MSE for regression). Then the same is done after permuting each predictor
variable. The difference between the two are then averaged over all trees in
the forest, and normalized by the standard deviation of the differences. If
the standard deviation of the differences is equal to 0 for a variable,
the division is not done (but the average is almost always equal to 0 in that
case). See importance
for details, including
additional arguments that can be passed via the ...
argument.
The second measure is the total decrease in node impurities from splitting on
the variable, averaged over all trees. For classification, the node impurity
is measured by the Gini index. For regression, it is measured by residual sum
of squares. See importance
for details.
cforest
Same approach described in
cforest
above. See varimp
and
varimpAUC
(if type = "auc"
) for details.
ranger
Variable importance for
ranger
objects is computed in the usual way for random
forests. The approach used depends on the importance
argument provided
in the initial call to ranger
. See
importance
for details.
rpart
As stated in one of the rpart
vignettes. A variable may appear in the tree many times, either as a primary
or a surrogate variable. An overall measure of variable importance is the sum
of the goodness of split measures for each split for which it was the primary
variable, plus "goodness" * (adjusted agreement) for all splits in which it
was a surrogate. Imagine two variables which were essentially duplicates of
each other; if we did not count surrogates, they would split the importance
with neither showing up as strongly as it should. See
rpart
for details.
train
Various model-specific and model-agnostic
approaches that depend on the learning algorithm employed in the original
call to train
. See varImp
for
details.
xgboost
For linear models, the variable
importance is the absolute magnitude of the estimated coefficients. For that
reason, in order to obtain a meaningful ranking by importance for a linear
model, the features need to be on the same scale (which you also would want
to do when using either L1 or L2 regularization). Otherwise, the approach
described in Friedman (2001) for gbm
s is used. See
xgb.importance
for details. For tree models, you can
obtain three different types of variable importance:
Using type = "gain"
(the default) gives the fractional
contribution of each feature to the model based on the total gain of the
corresponding feature's splits.
Using type = "cover"
gives the number of observations related
to each feature.
Using type = "frequency"
gives the percentages representing
the relative number of times each feature has been used throughout each
tree in the ensemble.