xgboost (version 0.4-3)

xgb.model.dt.tree: Convert tree model dump to data.table


Read a tree model text dump and return a data.table.


xgb.model.dt.tree(feature_names = NULL, filename_dump = NULL,
  model = NULL, text = NULL, n_first_tree = NULL)


names of each feature as a character vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should be NULL.
the path to the text file storing the model. Model dump must include the gain per feature and per tree (parameter with.stats = T in function xgb.dump).
dump generated by the xgb.train function. Avoid the creation of a dump file.
dump generated by the xgb.dump function. Avoid the creation of a dump file. Model dump must include the gain per feature and per tree (parameter with.stats = T in function xgb.dump).
limit the plot to the n first trees. If NULL, all trees of the model are plotted. Performance can be low for huge models.


  • A data.table of the features used in the model with their gain, cover and few other thing.


General function to convert a text dump of tree model to a Matrix. The purpose is to help user to explore the model and get a better understanding of it.

The content of the data.table is organised that way:

  • ID: unique identifier of a node ;
  • Feature: feature used in the tree to operate a split. When Leaf is indicated, it is the end of a branch ;
  • Split: value of the chosen feature where is operated the split ;
  • Yes: ID of the feature for the next node in the branch when the split condition is met ;
  • No: ID of the feature for the next node in the branch when the split condition is not met ;
  • Missing: ID of the feature for the next node in the branch for observation where the feature used for the split are not provided ;
  • Quality: it's the gain related to the split in this specific node ;
  • Cover: metric to measure the number of observation affected by the split ;
  • Tree: ID of the tree. It is included in the main ID ;
  • Yes.XorNo.X: data related to the pointer inYesorNocolumn ;


Run this code
data(agaricus.train, package='xgboost')

#Both dataset are list with two items, a sparse matrix and labels
#(labels = outcome column which will be learned).
#Each column of the sparse Matrix is a feature in one hot encoding format.
train <- agaricus.train

bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
               eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")

#agaricus.test$data@Dimnames[[2]] represents the column names of the sparse matrix.
xgb.model.dt.tree(agaricus.train$data@Dimnames[[2]], model = bst)

Run the code above in your browser using DataCamp Workspace