xgboost (version 0.6-0)

xgb.plot.multi.trees: Project all trees on one tree and plot it

Description

Visualization of the ensemble of trees as a single collective unit.

Usage

xgb.plot.multi.trees(model, feature_names = NULL, features_keep = 5, plot_width = NULL, plot_height = NULL, ...)

Arguments

model
dump generated by the xgb.train function.
feature_names
names of each feature as a character vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should be NULL.
features_keep
number of features to keep in each position of the multi trees.
plot_width
width in pixels of the graph to produce
plot_height
height in pixels of the graph to produce
...
currently not used

Value

Two graphs showing the distribution of the model deepness.

Details

This function tries to capture the complexity of gradient boosted tree ensemble in a cohesive way.

The goal is to improve the interpretability of the model generally seen as black box. The function is dedicated to boosting applied to decision trees only.

The purpose is to move from an ensemble of trees to a single tree only.

It takes advantage of the fact that the shape of a binary tree is only defined by its deepness (therefore in a boosting model, all trees have the same shape).

Moreover, the trees tend to reuse the same features.

The function will project each tree on one, and keep for each position the features_keep first features (based on Gain per feature measure).

This function is inspired by this blog post: https://wellecks.wordpress.com/2015/02/21/peering-into-the-black-box-visualizing-lambdamart/

Examples

Run this code
data(agaricus.train, package='xgboost')

bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max_depth = 15,
                 eta = 1, nthread = 2, nrounds = 30, objective = "binary:logistic",
                 min_child_weight = 50)

p <- xgb.plot.multi.trees(model = bst, feature_names = colnames(agaricus.train$data),
                          features_keep = 3)
print(p)

Run the code above in your browser using DataCamp Workspace