xgb.importance
Show importance of features in a model
Create a data.table
of the most important features of a model.
Usage
xgb.importance(feature_names = NULL, model = NULL, data = NULL,
label = NULL, target = function(x) ((x + label) == 2))
Arguments
- feature_names
names of each feature as a
character
vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should beNULL
.- model
generated by the
xgb.train
function.- data
the dataset used for the training step. Will be used with
label
parameter for co-occurence computation. More information inDetail
part. This parameter is optional.- label
the label vector used for the training step. Will be used with
data
parameter for co-occurence computation. More information inDetail
part. This parameter is optional.- target
a function which returns
TRUE
or1
when an observation should be count as a co-occurence andFALSE
or0
otherwise. Default function is provided for computing co-occurences in a binary classification. Thetarget
function should have only one parameter. This parameter will be used to provide each important feature vector after having applied the split condition, therefore these vector will be only made of 0 and 1 only, whatever was the information before. More information inDetail
part. This parameter is optional.
Details
This function is for both linear and tree models.
data.table
is returned by the function.
The columns are:
Features
name of the features as provided infeature_names
or already present in the model dump;Gain
contribution of each feature to the model. For boosted tree model, each gain of each feature of each tree is taken into account, then average per feature to give a vision of the entire model. Highest percentage means important feature to predict thelabel
used for the training (only available for tree models);Cover
metric of the number of observation related to this feature (only available for tree models);Weight
percentage representing the relative number of times a feature have been taken into trees.
If you don't provide feature_names
, index of the features will be used instead.
Because the index is extracted from the model dump (made on the C++ side), it starts at 0 (usual in C++) instead of 1 (usual in R).
Co-occurence count ------------------
The gain gives you indication about the information of how a feature is important in making a branch of a decision tree more pure. However, with this information only, you can't know if this feature has to be present or not to get a specific classification. In the example code, you may wonder if odor=none should be TRUE
to not eat a mushroom.
Co-occurence computation is here to help in understanding this relation between a predictor and a specific class. It will count how many observations are returned as TRUE
by the target
function (see parameters). When you execute the example below, there are 92 times only over the 3140 observations of the train dataset where a mushroom have no odor and can be eaten safely.
If you need to remember only one thing: unless you want to leave us early, don't eat a mushroom which has no odor :-)
Value
A data.table
of the features used in the model with their average gain (and their weight for boosted tree model) in the model.
Examples
# NOT RUN {
data(agaricus.train, package='xgboost')
bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max_depth = 2,
eta = 1, nthread = 2, nrounds = 2,objective = "binary:logistic")
xgb.importance(colnames(agaricus.train$data), model = bst)
# Same thing with co-occurence computation this time
xgb.importance(colnames(agaricus.train$data), model = bst,
data = agaricus.train$data, label = agaricus.train$label)
# }