obliqueRF: Classification with Oblique Random Forest

Description

obliqueRF implements a random forest with oblique decision trees for binary classification tasks. Discriminative node models in the tree are based on: ridge regression, logistic regression, linear support vector machines, or random splits.

Usage

## S3 method for class 'default':
obliqueRF(
                    x,
                    y,
                    x.test=NULL,
                    y.test=NULL,
                    mtry=NULL,
                    ntree=100,
                    training_method="ridge",
                    bImportance = F,
                    bProximity = F,
                    verbose = F,
                    ...
                    )

Arguments

Value

An object of class obliqueRF, which is a list with the following components:callthe original call to obliqueRFtypefor now only classificationerrslist with errorsclass_namesclass names referring to classes "0" and "1" in errs.predlist containing the prediction resultlabdescription of the node training methodntreenumber of trees usedmtrynumber of split variablesimportancea vector with the variable importances - or NULL, if the importance was not calculatedproximitythe variable proximity - or NULL, if the proximity was not calculatednum_classesthe number of classestreesthe tree structure that was learned

Details

Subspace dimensionality mtry should be adjusted on a test set for optimal performance; ntree should be chosen sufficiently large.

Node models with constraint, i.e., ridge regression, partial least squares regression, linear support vector machine, are optimized in each split in a test on the out-of-bag samples available at that node. (Ridge and partial least squares regression are used without feature scaling, the support vector machine model scales feature.) Choose the logistic node model if a constrained fit is not desired or required.

The obliqueRF importance counts how often a variable was deemed relevant (at .05 level) when chosen for a split at a node (increasing the importance value by 1) and how often it was irrelevant for the split (decreasing by 1). Significance is determined through ANOVA tables for the fitted logistic node model.

This is an R implementation, C code available from the authors upon request.

References

Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA. On oblique random forests. Proc ECML/PKDD 2011. LNCS 6911, 453-469 http://people.csail.mit.edu/menze/papers/menze_11_oblique.pdf.

Examples

Run this code

require(obliqueRF)
data(iris)

## data
# extract feature matrix
x<-as.matrix(iris[,1:4])
# convert to 0/1 class labels
y<-as.numeric(iris[,5]=="setosa")

## train
smp<-sample(1:nrow(iris), nrow(iris)/5)
obj <- obliqueRF(x[-smp,], y[-smp])

## test
pred <- predict(obj, x[smp,], type="prob")
plot(pred[,2],col=y[smp]+1,ylab="setosa probability")
table(pred[,2]>.5,y[smp])

## example: importance
imp<-rep(0,ncol(x))
names(imp)<-colnames(x)
numIterations<-2 #increase the number of iterations for better results, e.g., numIterations=100
for(i in 1:numIterations){
 obj<-obliqueRF(x,y, 
	training_method="log", bImportance=TRUE, 
	mtry=2, ntree=20)
 imp<-imp+obj$imp
 plot(imp,t='l', main=paste("steps:", i*20), ylab="obliqueRF importance")
}

Run the code above in your browser using DataLab