Learn R Programming

rfUtilities (version 2.0-0)

rf.modelSel: Random Forest Model Selection

Description

Implements Murphy et al., (2010) Random Forests model selection approach.

Usage

rf.modelSel(xdata, ydata, imp.scale = "mir", r = c(0.25, 0.5, 0.75), final.model = FALSE, seed = NULL, parsimony = NULL, ...)

Arguments

xdata
X Data for model
ydata
Y Data for model
imp.scale
Type of scaling for importance values (mir or se), default is mir
r
Vector of importance percentiles to test i.e., c(0.1, 0.2, 0.5, 0.7, 0.9)
final.model
Run final model with selected variables (TRUE/FALSE)
seed
Sets random seed in the R global environment. This is highly suggested.
parsimony
Threshold for competing model (0-1)
...
Additional arguments to pass to randomForest (e.g., ntree=1000, replace=TRUE, proximity=TRUE)

Value

A list class object with the following components: @return rf.final Final selected model, if final = TRUE(randomForest model object) @return sel.vars Final selected variables (vector) @return test Validation parameters used on model selection (data.frame) @return sel.importance Importance values for selected model (data.frame) @return importance Importance values for all models (data.frame) @return parameters Variables used in each tested model (list) @return s Type of scaling used for importance

References

Evans, J.S. and S.A. Cushman (2009) Gradient Modeling of Conifer Species Using Random Forest. Landscape Ecology 5:673-683.

Murphy M.A., J.S. Evans, and A.S. Storfer (2010) Quantify Bufo boreas connectivity in Yellowstone National Park with landscape genetics. Ecology 91:252-261

Evans J.S., M.A. Murphy, Z.A. Holden, S.A. Cushman (2011). Modeling species distribution and change using Random Forests CH.8 in Predictive Modeling in Landscape Ecology eds Drew, CA, Huettmann F, Wiersma Y. Springer

See Also

randomForest for randomForest model options

Examples

Run this code
# Classification on iris data
require(randomForest)
data(iris)
  iris$Species <- as.factor(iris$Species)
( rf.class <- rf.modelSel(iris[,1:4], iris[,"Species"], seed=1234, imp.scale="mir") )
( rf.class <- rf.modelSel(iris[,1:4], iris[,"Species"], seed=1234, imp.scale="mir", 
                          parsimony=0.03) )

   plot(rf.class)              # plot importance for selected variables
   plot(rf.class, imp = "all") # plot importance for all variables 

 vars <- rf.class$selvars
 ( rf.fit <- randomForest(x=iris[,vars], y=iris[,"Species"]) )

# Regression on airquality data
data(airquality)
  airquality <- na.omit(airquality)
( rf.regress <- rf.modelSel(airquality[,2:6], airquality[,1], imp.scale="se") )
( rf.regress <- rf.modelSel(airquality[,2:6], airquality[,1], imp.scale="se", parsimony=0.03) )

   plot(rf.regress)              # plot importance for selected variables
   plot(rf.regress, imp = "all") # plot importance for all variables 

# To use parameters from competing model
vars <- rf.regress$parameters[[3]]

# To use parameters from selected model
vars <- rf.regress$selvars 

( rf.fit <- randomForest(x=airquality[,vars], y=airquality[,1]) )

Run the code above in your browser using DataLab