Learn R Programming

tree.interpreter (version 0.1.1)

MDIoobTree: Debiased Mean Decrease in Impurity

Description

Calculate the MDI-oob feature importance measure.

Usage

MDIoobTree(tidy.RF, tree, trainX, trainY)

MDIoob(tidy.RF, trainX, trainY)

Arguments

tidy.RF

A tidy random forest. The random forest to calculate MDI-oob from.

tree

An integer. The index of the tree to look at.

trainX

A data frame. Train set features, such that the Tth tree is trained with X[tidy.RF$inbag.counts[[T]], ].

trainY

A data frame. Train set responses, such that the Tth tree is trained with Y[tidy.RF$inbag.counts[[T]], ].

Value

A matrix. The content depends on the type of the response.

  • Regression: A P-by-1 matrix, where P is the number of features in X. The pth row contains the MDI-oob of feature p.

  • Classification: A P-by-D matrix, where P is the number of features in X and D is the number of response classes. The dth column of the pth row contains the MDI-oob of feature p to class d. You can get the MDI-oob of each feature by calling rowSums on the result.

Functions

  • MDIoobTree: Debiased mean decrease in impurity within a single tree

  • MDIoob: Debiased mean decrease in impurity within the whole forest

Details

It has long been known that MDI incorrectly assigns high importance to noisy features, leading to systematic bias in feature selection. To address this issue, Li et al. proposed a debiased MDI feature importance measure using out-of-bag samples, called MDI-oob, which has achieved state-of-the-art performance in feature selection for both simulated and real data.

See vignette('MDI', package='tree.interpreter') for more context.

References

A Debiased MDI Feature Importance Measure for Random Forests https://arxiv.org/abs/1906.10845

See Also

MDI

vignette('MDI', package='tree.interpreter')

Examples

Run this code
# NOT RUN {
library(ranger)
rfobj <- ranger(Species ~ ., iris, keep.inbag=TRUE)
tidy.RF <- tidyRF(rfobj, iris[, -5], iris[, 5])
MDIoobTree(tidy.RF, 1, iris[, -5], iris[, 5])
MDIoob(tidy.RF, iris[, -5], iris[, 5])

# }

Run the code above in your browser using DataLab