Quantify the strength of two-way interaction effects using a simple feature importance ranking measure (FIRM) approach. For details, see Greenwell et al. (2018).
vint(
object,
feature_names,
progress = "none",
parallel = FALSE,
paropts = NULL,
...
)
A fitted model object (e.g., a "randomForest"
object).
Character string giving the names of the two features of interest.
Character string giving the name of the progress bar to use
while constructing the interaction statistics. See
create_progress_bar
for details. Default is
"none"
.
Logical indicating whether or not to run partial
in
parallel using a backend provided by the foreach
package. Default is
FALSE
.
List containing additional options to be passed on to
foreach
when parallel = TRUE
.
Additional optional arguments to be passed on to
partial
.
This function quantifies the strength of interaction between features $X_1$ and $X_2$ by measuring the change in variance along slices of the partial dependence of $X_1$ and $X_2$ on the target $Y$. See Greenwell et al. (2018) for details and examples.
Greenwell, B. M., Boehmke, B. C., and McCarthy, A. J.: A Simple and Effective Model-Based Variable Importance Measure. arXiv preprint arXiv:1805.04755 (2018).
if (FALSE) {
#
# The Friedman 1 benchmark problem
#
# Load required packages
library(gbm)
library(ggplot2)
library(mlbench)
# Simulate training data
trn <- gen_friedman(500, seed = 101) # ?vip::gen_friedman
#
# NOTE: The only interaction that actually occurs in the model from which
# these data are generated is between x.1 and x.2!
#
# Fit a GBM to the training data
set.seed(102) # for reproducibility
fit <- gbm(y ~ ., data = trn, distribution = "gaussian", n.trees = 1000,
interaction.depth = 2, shrinkage = 0.01, bag.fraction = 0.8,
cv.folds = 5)
best_iter <- gbm.perf(fit, plot.it = FALSE, method = "cv")
# Quantify relative interaction strength
all_pairs <- combn(paste0("x.", 1:10), m = 2)
res <- NULL
for (i in seq_along(all_pairs)) {
interact <- vint(fit, feature_names = all_pairs[, i], n.trees = best_iter)
res <- rbind(res, interact)
}
# Plot top 20 results
top_20 <- res[1L:20L, ]
ggplot(top_20, aes(x = reorder(Variables, Interaction), y = Interaction)) +
geom_col() +
coord_flip() +
xlab("") +
ylab("Interaction strength")
}
Run the code above in your browser using DataLab