featureContribTree: Feature Contribution

Description

Contribution of each feature to the prediction.

Usage

featureContribTree(tidy.RF, tree, X)
featureContrib(tidy.RF, X)

Arguments

tidy.RF

A tidy random forest. The random forest to make predictions with.

tree

An integer. The index of the tree to look at.

A data frame. Features of samples to be predicted.

Value

A cube (3D array). The content depends on the type of the response.

Regression: A P-by-1-by-N array, where P is the number of features in X, and N the number of samples in X. The pth row of the nth slice stands for the contribution of feature p to the prediction for response n.
Classification: A P-by-D-by-N array, where P is the number of features in X, D is the number of response classes, and N is the number of samples in X. The pth row of the nth slice stands for the contribution of feature p to the prediction of each response class for response n.

Functions

featureContribTree: Feature contribution to prediction within a single tree
featureContrib: Feature contribution to prediction within the whole forest

Details

Recall that each node in a decision tree has a prediction associated with it. For regression trees, it's the average response in that node, whereas in classification trees, it's the frequency of each response class, or the most frequent response class in that node.

For a tree in the forest, the contribution of each feature to the prediction of a sample is the sum of differences between the predictions of nodes which split on the feature and those of their children, i.e. the sum of changes in node prediction caused by spliting on the feature. This is the calculated by featureContribTree.

For a forest, the contribution of each feature to the prediction if a sample is the average contribution across all trees in the forest. This is because the prediction of a forest is the average of the predictions of its trees. This is calculated by featureContrib.

Together with trainsetBias(Tree), they can decompose the prediction by feature importance:

$$prediction(MODEL, X) = trainsetBias(MODEL) + featureContrib_1(MODEL, X) + ... + featureContrib_p(MODEL, X),$$

where MODEL can be either a tree or a forest.

References

Interpreting random forests http://blog.datadive.net/interpreting-random-forests/

Random forest interpretation with scikit-learn http://blog.datadive.net/random-forest-interpretation-with-scikit-learn/

Examples

Run this code

# NOT RUN {
library(ranger)
test.id <- 50 * seq(3)
rfobj <- ranger(Species ~ ., iris[-test.id, ], keep.inbag=TRUE)
tidy.RF <- tidyRF(rfobj, iris[-test.id, -5], iris[-test.id, 5])
featureContribTree(tidy.RF, 1, iris[test.id, -5])
featureContrib(tidy.RF, iris[test.id, -5])

# }

Run the code above in your browser using DataLab