varimp(object, mincriterion = 0, conditional = FALSE,
threshold = 0.2, nperm = 1, OOB = TRUE, pre1.0_0 = conditional)
varimpAUC(object, mincriterion = 0, conditional = FALSE,
threshold = 0.2, nperm = 1, OOB = TRUE, pre1.0_0 = conditional)
cforest
.mincriterion = 0
guarantvarimp
can be used to compute variable importance measures
similar to those computed by importance
. Besides the
standard version, a conditional version is available, that adjusts for correlations between
predictor variables.
If conditional = TRUE
, the importance of each variable is computed by permuting
within a grid defined by the covariates that are associated (with 1 - p-value
greater than threshold
) to the variable of interest.
The resulting variable importance score is conditional in the sense of beta coefficients in
regression models, but represents the effect of a variable in both main effects and interactions.
See Strobl et al. (2008) for details. Note, however, that all random forest results are subject to random variation. Thus, before
interpreting the importance ranking, check whether the same ranking is achieved with a
different random seed -- or otherwise increase the number of trees ntree
in
ctree_control
.
Note that in the presence of missings in the predictor variables the procedure
described in Hapfelmeier et al. (2012) is performed.
Function varimpAUC
implements AUC-based variables importances as
described by Janitza et al. (2012). Here, the area under the curve
instead of the accuracy is used to calculate the importance of each variable.
This AUC-based variable importance measure is more robust towards class imbalance.
Torsten Hothorn, Kurt Hornik, and Achim Zeileis (2006b). Unbiased
Recursive Partitioning: A Conditional Inference Framework.
Journal of Computational and Graphical Statistics, 15 (3),
651-674. Preprint available from
Carolin Strobl, Anne-Laure Boulesteix, Thomas Kneib, Thomas Augustin, and Achim Zeileis (2008).
Conditional Variable Importance for Random Forests. BMC Bioinformatics, 9, 307.
set.seed(290875)
readingSkills.cf <- cforest(score ~ ., data = readingSkills,
control = cforest_unbiased(mtry = 2, ntree = 50))
# standard importance
varimp(readingSkills.cf)
# the same modulo random variation
varimp(readingSkills.cf, pre1.0_0 = TRUE)
# conditional importance, may take a while...
varimp(readingSkills.cf, conditional = TRUE)
Run the code above in your browser using DataLab