Plot feature importance as a bar graph
Represents previously calculated feature importance as a bar graph.
xgb.plot.importance uses base R graphics, while
xgb.ggplot.importance uses the ggplot backend.
xgb.ggplot.importance(importance_matrix = NULL, top_n = NULL, measure = NULL, rel_to_first = FALSE, n_clusters = c(1:10), ...)xgb.plot.importance(importance_matrix = NULL, top_n = NULL, measure = NULL, rel_to_first = FALSE, left_margin = 10, cex = NULL, plot = TRUE, ...)
- maximal number of top features to include into the plot.
- the name of importance measure to plot.
NULL, 'Gain' would be used for trees and 'Weight' would be used for gblinear.
- whether importance values should be represented as relative to the highest ranked feature. See Details.
- (ggplot only) a
numericvector containing the min and the max range of the possible number of clusters of bars.
- other parameters passed to
barplot(except horiz, border, cex.names, names.arg, and las).
- (base R barplot) allows to adjust the left margin size to fit feature names.
When it is NULL, the existing
- (base R barplot) passed as
- (base R barplot) whether a barplot should be produced. If FALSE, only a data.table is returned.
The graph represents each feature as a horizontal bar of length proportional to the importance of a feature.
Features are shown ranked in a decreasing importance order.
It works for importances from both
rel_to_first = FALSE, the values would be plotted as they were in
For gbtree model, that would mean being normalized to the total of 1
("what is feature's importance contribution relative to the whole model?").
For linear models,
rel_to_first = FALSE would show actual values of the coefficients.
rel_to_first = TRUE allows to see the picture from the perspective of
"what is feature's importance contribution relative to the most important feature?"
The ggplot-backend method also performs 1-D custering of the importance values, with bar colors coresponding to different clusters that have somewhat similar importance values.
xgb.plot.importancefunction creates a
plot=TRUE) and silently returns a processed data.table with
n_topfeatures sorted by importance.The
xgb.ggplot.importancefunction returns a ggplot graph which could be customized afterwards. E.g., to change the title of the graph, add
+ ggtitle("A GRAPH NAME")to the result.
data(agaricus.train) bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max_depth = 3, eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic") importance_matrix <- xgb.importance(colnames(agaricus.train$data), model = bst) xgb.plot.importance(importance_matrix, rel_to_first = TRUE, xlab = "Relative importance") (gg <- xgb.ggplot.importance(importance_matrix, measure = "Frequency", rel_to_first = TRUE)) gg + ggplot2::ylab("Frequency")