Calculate variable importance (VIMP) for a single variable or group of variables for training or test data.
# S3 method for rfsrc
vimp(object, xvar.names,
importance = c("anti", "permute", "random"), block.size = 10,
joint = FALSE, seed = NULL, do.trace = FALSE, ...)
An object of class (rfsrc, predict)
containing importance
values.
An object of class (rfsrc, grow)
or (rfsrc, forest)
. The original rfsrc
call must have been made with forest = TRUE
.
Character vector of x-variable names to be evaluated. If not specified, all variables are used.
Type of variable importance (VIMP) to compute.
Integer specifying the number of trees per block used for VIMP calculation. Balances between ensemble-level and tree-level estimates.
Logical indicating whether to compute joint VIMP for the specified variables.
Negative integer used to set the random number generator seed.
Number of seconds between printed progress updates.
Additional arguments passed to or from other methods.
Hemant Ishwaran and Udaya B. Kogalur
Using a previously trained forest, this function calculates variable importance (VIMP) for the specified variables in xvar.names
. By default, VIMP is computed using the original training data, but the user may supply a new test set via the newdata
argument. See rfsrc
for further details on how VIMP is computed.
If joint = TRUE
, joint VIMP is returned. This is defined as the importance of a group of variables when the entire group is perturbed simultaneously.
Setting csv = TRUE
returns case-specific VIMP, which provides VIMP estimates at the individual observation level. This applies to all families except survival. See examples below.
Ishwaran H. (2007). Variable importance in binary regression trees and forests, Electronic J. Statist., 1:519-537.
holdout.vimp.rfsrc
,
rfsrc
# \donttest{
## ------------------------------------------------------------
## classification example
## showcase different vimp
## ------------------------------------------------------------
iris.obj <- rfsrc(Species ~ ., data = iris)
## anti vimp (default)
print(vimp(iris.obj)$importance)
## anti vimp using brier prediction error
print(vimp(iris.obj, perf.type = "brier")$importance)
## permutation vimp
print(vimp(iris.obj, importance = "permute")$importance)
## random daughter vimp
print(vimp(iris.obj, importance = "random")$importance)
## joint anti vimp
print(vimp(iris.obj, joint = TRUE)$importance)
## paired anti vimp
print(vimp(iris.obj, c("Petal.Length", "Petal.Width"), joint = TRUE)$importance)
print(vimp(iris.obj, c("Sepal.Length", "Petal.Width"), joint = TRUE)$importance)
## ------------------------------------------------------------
## survival example
## anti versus permute VIMP with different block sizes
## ------------------------------------------------------------
data(pbc, package = "randomForestSRC")
pbc.obj <- rfsrc(Surv(days, status) ~ ., pbc)
print(vimp(pbc.obj)$importance)
print(vimp(pbc.obj, block.size=1)$importance)
print(vimp(pbc.obj, importance="permute")$importance)
print(vimp(pbc.obj, importance="permute", block.size=1)$importance)
## ------------------------------------------------------------
## imbalanced classification example
## see the imbalanced function for more details
## ------------------------------------------------------------
data(breast, package = "randomForestSRC")
breast <- na.omit(breast)
f <- as.formula(status ~ .)
o <- rfsrc(f, breast, ntree = 2000)
## permutation vimp
print(100 * vimp(o, importance = "permute")$importance)
## anti vimp using gmean performance
print(100 * vimp(o, perf.type = "gmean")$importance[, 1])
## ------------------------------------------------------------
## regression example
## ------------------------------------------------------------
airq.obj <- rfsrc(Ozone ~ ., airquality)
print(vimp(airq.obj))
## ------------------------------------------------------------
## regression example where vimp is calculated on test data
## ------------------------------------------------------------
set.seed(100080)
train <- sample(1:nrow(airquality), size = 80)
airq.obj <- rfsrc(Ozone~., airquality[train, ])
## training data vimp
print(airq.obj$importance)
print(vimp(airq.obj)$importance)
## test data vimp
print(vimp(airq.obj, newdata = airquality[-train, ])$importance)
## ------------------------------------------------------------
## case-specific vimp
## returns VIMP for each case
## ------------------------------------------------------------
o <- rfsrc(mpg~., mtcars)
v <- vimp(o, csv = TRUE)
csvimp <- get.mv.csvimp(v, standardize=TRUE)
print(csvimp)
## ------------------------------------------------------------
## case-specific joint vimp
## returns joint VIMP for each case
## ------------------------------------------------------------
o <- rfsrc(mpg~., mtcars)
v <- vimp(o, joint = TRUE, csv = TRUE)
csvimp <- get.mv.csvimp(v, standardize=TRUE)
print(csvimp)
## ------------------------------------------------------------
## case-specific joint vimp for multivariate regression
## returns joint VIMP for each case, for each outcome
## ------------------------------------------------------------
o <- rfsrc(Multivar(mpg, cyl) ~., data = mtcars)
v <- vimp(o, joint = TRUE, csv = TRUE)
csvimp <- get.mv.csvimp(v, standardize=TRUE)
print(csvimp)
# }
Run the code above in your browser using DataLab