compVarImp: Compute permutation variable importance measure

Description

Compute permutation variable importance measure from a random forest for classification and regression.

Usage

compVarImp(X, y,rForest,nPerm=1)

Arguments

a data frame or a matrix of predictors.

a response vector. If a factor, classification is assumed, otherwise regression is assumed.

rForest

an object of class randomForest, keep.forest,keep.inbag must be set to True.

nPerm

Number of times the OOB data are permuted per tree for assessing variable importance. Number larger than 1 gives slightly more stable estimate, but not very effective. Currently only implemented for regression.

Value

importanceThe permutation variable importance measure. A matrix with nclass + 1 (for classification) or one (for regression) columns. For classification, the first nclass columns are the class-specific measures computed as mean decrease in accuracy. The nclass + 1st column is the mean decrease in accuracy over all classes. For regression the mean decrease in MSE is given.
importanceSDThe "standard errors" of the permutation-based importance measure. For classification, a p by nclass + 1 matrix corresponding to the first nclass + 1 columns of the importance matrix. For regression a vector of length p.
typeone of regression, classification

Details

The permutation variable importance measure is computed from permuting OOB data: For each tree, the prediction error on the out-of-bag observations is recorded. Then the same is done after permuting a predictor variable. The differences between the two error rates are then averaged over all trees.

References

Breiman L. (2001), Random Forests, Machine Learning 45(1),5-32,

Examples

Run this code

##############################
#      Classification        #
##############################
## Simulating data
X = replicate(8,rnorm(100))
X= data.frame( X) #"X" can also be a matrix
z  = with(X,5*X1 + 3*X2 + 2*X3 + 1*X4 -
            5*X5 - 9*X6 - 2*X7 + 1*X8 )
pr = 1/(1+exp(-z))         # pass through an inv-logit function
y = as.factor(rbinom(100,1,pr))
##############################
## Classification with Random Forest:
library("randomForest")
cl.rf= randomForest(X,y,mtry = 3,ntree=100,
                    importance=TRUE,keep.inbag = TRUE)

##############################
## Permutation variable importance measure
vari= compVarImp(X,y,cl.rf)

##############################
#compare them with the original results
cbind(cl.rf$importance[,1:3],vari$importance)
cbind(cl.rf$importance[,3],vari$importance[,3])
cbind(cl.rf$importanceSD,vari$importanceSD)
cbind(cl.rf$importanceSD[,3],vari$importanceSD[,3])
cbind(cl.rf$type,vari$type)


###############################
#      Regression             #
###############################
## Simulating data
X = replicate(8,rnorm(100))
X= data.frame( X) #"X" can also be a matrix
y= with(X,5*X1 + 3*X2 + 2*X3 + 1*X4 -
          5*X5 - 9*X6 - 2*X7 + 1*X8 )
##############################
## Regression with Random Forest:
library("randomForest")
reg.rf= randomForest(X,y,mtry = 3,ntree=100,
                     importance=TRUE,keep.inbag = TRUE)

##############################
## Permutation variable importance measure
vari= compVarImp(X,y,reg.rf)

##############################
#compare them with the original results
cbind(importance(reg.rf, type=1, scale=FALSE),vari$importance)
cbind(reg.rf$importanceSD,vari$importanceSD)
cbind(reg.rf$type,vari$type)

Run the code above in your browser using DataLab