A matrix. The content depends on the type of the response.
Regression: A P-by-1 matrix, where P is the number of features in
X. The pth row contains the MDI of feature p.
Classification: A P-by-D matrix, where P is the number of features
in X and D is the number of response classes. The dth column of
the pth row contains the MDI of feature p to class d. You can get the
MDI of each feature by calling rowSums on the result.
Arguments
tidy.RF
A tidy random forest. The random forest to calculate MDI
from.
tree
An integer. The index of the tree to look at.
trainX
A data frame. Train set features, such that the Tth
tree is trained with X[tidy.RF$inbag.counts[[T]], ].
trainY
A data frame. Train set responses, such that the Tth
tree is trained with Y[tidy.RF$inbag.counts[[T]], ].
Functions
MDITree: Mean decrease in impurity within a single tree
MDI: Mean decrease in impurity within the whole forest
Details
MDI stands for Mean Decrease in Impurity. It is a widely adopted measure of
feature importance in random forests. In this package, we calculate MDI with
a new analytical expression derived by Li et al. (See references)
See vignette('MDI', package='tree.interpreter') for more context.