Learn R Programming

rfUtilities (version 2.1-4)

rf.classBalance: Random Forest Class Balance (Zero Inflation Correction) Model

Description

Implements Evans & Cushman (2008) Random Forests class-balance (zero inflation) modeling approach.

Usage

rf.classBalance(ydata, xdata, p = 0.005, cbf = 3, sf = 2,
  seed = NULL, ...)

Arguments

ydata

Response variable using index (i.e., [,2] or [,"SPP"] )

xdata

Independent variables using index (i.e., [,3:14] or [3:ncol(data)] )

p

p-value of covariance convergence (do not recommend changing)

cbf

Scaling factor to test if problem is imbalanced, default is size of majority class * 3

sf

Majority subsampling factor. If sf=1 then random sample would be perfectly balanced with smallest class [s|0=n|1] whereas; sf=2 provides [s|0=(n|1*2)]

seed

Sets random seed in R global environment

...

Additional arguments passed to randomForest

Value

A rf.balanced object with the following components: @return model Final Combined Random Forests ensemble (randomForest object) @return OOB.error Out-of-bag error for each model (vector) @return confusion Confusion matrix for each model (list)

References

Evans, J.S. and S.A. Cushman (2009) Gradient Modeling of Conifer Species Using Random Forest. Landscape Ecology 5:673-683.

Evans J.S., M.A. Murphy, Z.A. Holden, S.A. Cushman (2011). Modeling species distribution and change using Random Forests CH.8 in Predictive Modeling in Landscape Ecology eds Drew, CA, Huettmann F, Wiersma Y. Springer

See Also

randomForest for randomForest ... model options

Examples

Run this code
# NOT RUN {
require(randomForest)
data(iris)
  iris$Species <- as.character(iris$Species)
    iris$Species <- ifelse(iris$Species == "setosa", "virginica", iris$Species)
      iris$Species <- as.factor(iris$Species)	
	
# Percent of "virginica" observations
length( iris$Species[iris$Species == "virginica"] ) / dim(iris)[1]*100

# Balanced model	
( cb <- rf.classBalance( ydata=iris[,"Species"], xdata=iris[,1:4], cbf=1 ) )

# Calculate Kappa for each balanced model in ensemble 
for(i in 1:length(cb$confusion) ) { 
  print( accuracy(cb$confusion[[i]][,1:2])[5] ) 
}

# Evaluate cumulative and mean confusion matrix
accuracy( round((cb$confusion[[1]] + cb$confusion[[2]] + cb$confusion[[3]]))[,1:2] )
accuracy( round((cb$confusion[[1]] + cb$confusion[[2]] + cb$confusion[[3]])/3)[,1:2])

# }

Run the code above in your browser using DataLab