xgboost (version 0.6-4)

xgb.model.dt.tree: Parse a boosted tree model text dump

Description

Parse a boosted tree model text dump into a data.table structure.

Usage

xgb.model.dt.tree(feature_names = NULL, model = NULL, text = NULL,
  n_first_tree = NULL)

Arguments

feature_names

character vector of feature names. If the model already contains feature names, this argument should be NULL (default value)

model

object of class xgb.Booster

text

character vector previously generated by the xgb.dump function (where parameter with_stats = TRUE should have been set).

n_first_tree

limit the parsing to the n first trees. If set to NULL, all trees of the model are parsed.

Value

A data.table with detailed information about model trees' nodes.

The columns of the data.table are:

  • Tree: ID of a tree in a model

  • Node: ID of a node in a tree

  • ID: unique identifier of a node in a model

  • Feature: for a branch node, it's a feature id or name (when available); for a leaf note, it simply labels it as 'Leaf'

  • Split: location of the split for a branch node (split condition is always "less than")

  • Yes: ID of the next node when the split condition is met

  • No: ID of the next node when the split condition is not met

  • Missing: ID of the next node when branch value is missing

  • Quality: either the split gain (change in loss) or the leaf value

  • Cover: metric related to the number of observation either seen by a split or collected by a leaf during training.

Examples

Run this code
# NOT RUN {
# Basic use:

data(agaricus.train, package='xgboost')

bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max_depth = 2, 
               eta = 1, nthread = 2, nrounds = 2,objective = "binary:logistic")

(dt <- xgb.model.dt.tree(colnames(agaricus.train$data), bst))


# How to match feature names of splits that are following a current 'Yes' branch:

merge(dt, dt[, .(ID, Y.Feature=Feature)], by.x='Yes', by.y='ID', all.x=TRUE)[order(Tree,Node)]
 
# }

Run the code above in your browser using DataLab