vimp: VIMP for Single or Grouped Variables

Description

Calculate variable importance (VIMP) for a single variable or group of variables for training or test data.

Usage

## S3 method for class 'rfsrc':
vimp(object, xvar.names,
  importance = c("permute", "random", "permute.ensemble", "random.ensemble", "none"),
  joint = FALSE, newdata, subset, na.action = c("na.omit", "na.impute"),
  seed = NULL, do.trace = FALSE, ...)

Arguments

object

An object of class (rfsrc, grow) or (rfsrc, forest). Requires in the original rfsrc call.

xvar.names

Names of the x-variables to be used. If not specified all variables are used.

importance

Type of VIMP.

joint

Individual or joint VIMP?

newdata

Test data. Default is to use the original grow (training) data.

subset

Vector indicating which rows of the data to use. If no test data is supplied, applies to the rows of object$xvar. Otherwise, it applies to the rows of newdata. All rows are used if not specified.

na.action

Action to be taken if the data contains NA values.

seed

Negative integer specifying seed for the random number generator.

do.trace

Logical. Should trace output be enabled? Default is FALSE. Integer values can also be passed. A positive value causes output to be printed each do.trace iteration.

...

Further arguments passed to or from other methods.

Value

An object of class (rfsrc, predict), which is a list with the following key components:
err.rateOOB error rate for the ensemble restricted to the subsetted data.
importanceVariable importance (VIMP).

Details

Using a previously grown forest, calculate the VIMP for variables xvar.names. By default, VIMP is calculated for the original data, but the user can specify a new test data for the VIMP calculation using newdata. Depending upon the option importance, VIMP is calculated either by random daughter assignment or by random permutation of the variable(s). The default is Breiman-Cutler permutation VIMP. See rfsrc for more details.

Joint VIMP is requested using . The joint VIMP is the importance for a group of variables when the group is perturbed simultaneously.

References

Ishwaran H. (2007). Variable importance in binary regression trees and forests, Electronic J. Statist., 1:519-537.

Examples

Run this code

## ------------------------------------------------------------
## classification example
## showcase different vimp
## ------------------------------------------------------------

iris.obj <- rfsrc(Species ~ ., data = iris)

# Breiman-Cutler permutation vimp
vimp(iris.obj)$importance

# Breiman-Cutler random daughter vimp
vimp(iris.obj, importance = "random")$importance

# Breiman-Cutler joint permutation vimp 
vimp(iris.obj, joint = TRUE)$importance

# Breiman-Cuter paired vimp
vimp(iris.obj, c("Petal.Length", "Petal.Width"), joint = TRUE)$importance
vimp(iris.obj, c("Sepal.Length", "Petal.Width"), joint = TRUE)$importance


## ------------------------------------------------------------
## regression example
## compare Breiman-Cutler vimp to ensemble based vimp
## ------------------------------------------------------------

airq.obj <- rfsrc(Ozone ~ ., airquality)
vimp.all <- cbind(
     ensemble = vimp(airq.obj, importance = "permute.ensemble")$importance,
     breimanCutler = vimp(airq.obj, importance = "permute")$importance)
print(vimp.all)


## ------------------------------------------------------------
## regression example
## calculate VIMP on test data
## ------------------------------------------------------------

set.seed(100080)
train <- sample(1:nrow(airquality), size = 80)
airq.obj <- rfsrc(Ozone~., airquality[train, ])

#training data vimp
airq.obj$importance
vimp(airq.obj)$importance

#test data vimp
vimp(airq.obj, newdata = airquality[-train, ])$importance

## ------------------------------------------------------------
## survival example
## study how vimp depends on tree imputation
## makes use of the subset option
## ------------------------------------------------------------

data(pbc, package = "randomForestSRC")

# determine which records have missing values
which.na <- apply(pbc, 1, function(x){any(is.na(x))})

# impute the data using na.action = "na.impute"
pbc.obj <- rfsrc(Surv(days,status) ~ ., pbc, nsplit = 3,
        na.action = "na.impute", nimpute = 1)

# compare vimp based on records with no missing values
# to those that have missing values
# note the option na.action="na.impute" in the vimp() call
vimp.not.na <- vimp(pbc.obj, subset = !which.na, na.action = "na.impute")$importance
vimp.na <- vimp(pbc.obj, subset = which.na, na.action = "na.impute")$importance
data.frame(vimp.not.na, vimp.na)

Run the code above in your browser using DataLab