bigrf (version 0.1-12)

varimp-methods: Compute Variable Importance

Description

Compute variable importance based on out-of-bag estimates. For each tree in the forest, the predictions of the out-of-bag examples are recorded. Then, a variable v is randomly permuted in the out-of-bag examples, and the tree is used to classify the out-of-bag examples again. The difference in votes for the correct class in the original data and the permuted data is used to calculate the variable importance for variable v. This process is then repeated for all variables.

Usage

"varimp"(forest, x=NULL, impbyexample=FALSE, reuse.cache=FALSE, trace=0L)

Arguments

forest
A random forest of class "bigcforest".
x
A big.matrix, matrix or data.frame of predictor variables. The data must not have changed, otherwise unexpected modelling results may occur. If a matrix or data.frame is specified, it will be converted into a big.matrix for computation. Optional if reuse.cache is TRUE.
impbyexample
A logical indicating whether to compute the variable importance for each out-of-bag example.
reuse.cache
TRUE to reuse disk caches of the big.matrix x from the initial building of the random forest, which may significantly reduce initialization time for large data sets. If TRUE, the user must ensure that the files ‘x’ and ‘x.desc’ in forest@cachepath have not been modified or deleted.
trace
0 for no verbose output. 1 to print verbose output. Default: 0.

Value

A list with the following components:
importance:
Importance of each variable, which is the number of votes for the correct class in the out-of-bag examples with variable v permuted subtracted from the number of votes for the correct class in the original out-of-bag examples, averaged over all trees in the forest.
importance.ex:
Importance of each variable for each out-of-bag example.
zscore:
Z-score of each variable, computed by dividing the raw variable importance score by the standard error.
significance:
Significance level of each variable importance, computed by applying the complementary error function on the z-score.

Methods

signature(forest = "bigcforest")
Compute variable importance for a classification random forest.

References

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.

Breiman, L. & Cutler, A. (n.d.). Random Forests. Retrieved from http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm.

See Also

fastimp

Examples

Run this code
# Classify cars in the Cars93 data set by type (Compact, Large,
# Midsize, Small, Sporty, or Van).

# Load data.
data(Cars93, package="MASS")
x <- Cars93
y <- Cars93$Type

# Select variables with which to train model.
vars <- c(4:22)

# Run model, grow 30 trees.
forest <- bigrfc(x, y, ntree=30L, varselect=vars, cachepath=NULL)

# Calculate variable importance, including those for each out-of-bag example.
importance <- varimp(forest, x, impbyexample=TRUE)

Run the code above in your browser using DataCamp Workspace