Last chance! 50% off unlimited learning
Sale ends in
Parse a boosted tree model text dump into a data.table
structure.
xgb.model.dt.tree(feature_names = NULL, model = NULL, text = NULL,
n_first_tree = NULL)
character vector of feature names. If the model already
contains feature names, this argument should be NULL
(default value)
object of class xgb.Booster
character
vector previously generated by the xgb.dump
function (where parameter with_stats = TRUE
should have been set).
limit the parsing to the n
first trees.
If set to NULL
, all trees of the model are parsed.
A data.table
with detailed information about model trees' nodes.
The columns of the data.table
are:
Tree
: ID of a tree in a model
Node
: ID of a node in a tree
ID
: unique identifier of a node in a model
Feature
: for a branch node, it's a feature id or name (when available);
for a leaf note, it simply labels it as 'Leaf'
Split
: location of the split for a branch node (split condition is always "less than")
Yes
: ID of the next node when the split condition is met
No
: ID of the next node when the split condition is not met
Missing
: ID of the next node when branch value is missing
Quality
: either the split gain (change in loss) or the leaf value
Cover
: metric related to the number of observation either seen by a split
or collected by a leaf during training.
# NOT RUN {
# Basic use:
data(agaricus.train, package='xgboost')
bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max_depth = 2,
eta = 1, nthread = 2, nrounds = 2,objective = "binary:logistic")
(dt <- xgb.model.dt.tree(colnames(agaricus.train$data), bst))
# How to match feature names of splits that are following a current 'Yes' branch:
merge(dt, dt[, .(ID, Y.Feature=Feature)], by.x='Yes', by.y='ID', all.x=TRUE)[order(Tree,Node)]
# }
Run the code above in your browser using DataLab