Learn R Programming

rfUtilities (version 1.0-2)

rf.modelSel: Random Forest Model Selection

Description

Implements Murphy et al., (2010) Random Forests model selection approach.

Usage

rf.modelSel(xdata, ydata, imp.scale = "mir", r = c(0.25, 0.5, 0.75),
  final.model = FALSE, plot.imp = TRUE, seed = NULL, parsimony = NULL,
  ...)

Arguments

xdata
X Data for model
ydata
Y Data for model
imp.scale
Type of scaling for importance values (mir or se), default is mir
r
Vector of importance percentiles to test i.e., c(0.1, 0.2, 0.5, 0.7, 0.9)
final.model
Run final model with selected variables (TRUE/FALSE)
plot.imp
Plot variable importance (TRUE/FALSE)
seed
Sets random seed in the R global environment. This is highly suggested.
parsimony
Threshold for competing model (0-1)
...
Arguments to pass to randomForest (e.g., ntree=1000, replace=TRUE, proximity=TRUE)

Value

  • A list class object with the following components: rf.final - Final selected model, if final=TRUE(randomForest model object) selvars - final selected variables (vector) test - Validation parameters used on model selection (data.frame) importance - Importance values for selected model (data.frame) parameters - Variables used in each tested model (list)

References

Evans, J.S. and S.A. Cushman (2009) Gradient Modeling of Conifer Species Using Random Forest. Landscape Ecology 5:673-683. Murphy M.A., J.S. Evans, and A.S. Storfer (2010) Quantify Bufo boreas connectivity in Yellowstone National Park with landscape genetics. Ecology 91:252-261 Evans J.S., M.A. Murphy, Z.A. Holden, S.A. Cushman (2011). Modeling species distribution and change using Random Forests CH.8 in Predictive Modeling in Landscape Ecology eds Drew, CA, Huettmann F, Wiersma Y. Springer

Examples

Run this code
# Classification on iris data
require(randomForest)
data(iris)
  iris$Species <- as.factor(iris$Species)
( rf.class <- rf.modelSel(iris[,1:4], iris[,"Species"], seed=1234, imp.scale="mir") )
( rf.class <- rf.modelSel(iris[,1:4], iris[,"Species"], seed=1234, imp.scale="mir",
                          parsimony=0.03) )
 vars <- rf.class$selvars
 ( rf.fit <- randomForest(x=iris[,vars], y=iris[,"Species"]) )

# Regression on airquality data
data(airquality)
  airquality <- na.omit(airquality)
( rf.regress <- rf.modelSel(airquality[,2:6], airquality[,1], imp.scale="se") )
( rf.regress <- rf.modelSel(airquality[,2:6], airquality[,1], imp.scale="se", parsimony=0.03) )

# To use parameters from competing model
vars <- rf.regress$parameters[[3]]

# To use parameters from selected model
vars <- rf.regress$selvars

( rf.fit <- randomForest(x=airquality[,vars], y=airquality[,1]) )

Run the code above in your browser using DataLab