Learn R Programming

forestFloor (version 1.4)

recTree: recursiveTree: cross-validated feature contributions

Description

internal C++ function to compute feature contributions for a random Forest

Usage

recTree(vars, obs, ntree, calculate_node_pred, X, Y, leftDaughter, 
    rightDaughter, nodestatus, xbestsplit, nodepred, bestvar, 
    inbag, varLevels, OOBtimes, localIncrements)

Arguments

vars
number of variables in X
obs
number of observations in X
ntree
number of trees starting from 1 function should iterate, cannot be higher than columns of inbag
calculate_node_pred
should the node predictions be recalculated(true) or reused from nodepred-matrix(false & regression)
X
X training matrix
Y
target vector, factor or regression
leftDaughter
a matrix from a the output of randomForest rfo$forest$leftDaughter the node.number/row.number of the leftDaughter in a given tree by column
rightDaughter
a matrix from a the output of randomForest rfo$forest$rightDaughter the node.number/row.number of the rightDaughter in a given tree by column
nodestatus
a matrix from a the output of randomForest rfo$forest$nodestatus the nodestatus of a given node in a given tree
xbestsplit
a matrix from a the output of randomForest rfo$forest$xbestsplit the split point of numeric variables or the binary split of categorical variables see details help(randomForest::getTree) for details of binary expansion of categorical splits
nodepred
a matrix from a the output of randomForest rfo$forest$xbestsplit the inbag target average for regression mode and the majority target class for classification
bestvar
a matrix from a the output of randomForest rfo$forest$xbestsplit the inbag target average for regression mode and the majority target class for classification
inbag
a matrix from the output of randomForest rfo$inbag for regression a matrix from the output of cinbag::trimTrees cinbag.out$inbagCounts contains... numbers either 0, out of bag, 1 once or multiple times in bag for randomForest function positive integer
varLevels
the number of levels of all varibles, 1 for continous and multinomal, >1 forcategorical variables. This is needed for categorical variables to interpretate binary split from xbestsplit.
OOBtimes
number of times a certain observation was out of bag in the forest. Needed to compute feature contributions as they are the sum local increments over out-of-bag obseravations over features divided by the OOBtimes. In previous implementation featurecont
localIncrements
an empty matrix to store localIncrements during computation. In the end the localIncrement matrix will become the feature contributions.

Value

  • no output, the feature contributions are writtten directly to localIncrements input

Details

This is function is excuted by the function forestFloor. This is a c++/Rcpp implementation computing feature contributions. The main differences from this implementation and the rfFC-package, is that these feature contributions is only summed over out-of-bag samples which give some kind of cross-validation. This implementation allows sample replacement but do not support more than binaray classification as rfFC do.

References

Interpretation of QSAR Models Based on Random Forest Methods, http://dx.doi.org/10.1002/minf.201000173 Interpreting random forest classification models using a feature contribution method, http://arxiv.org/abs/1312.1121

Examples

Run this code
rm(list=ls())
library(forestFloor)
#simulate data
obs=2500
vars = 6 

X = data.frame(replicate(vars,rnorm(obs)))
Y = with(X, X1^2 + sin(X2*pi) + 2 * X3 * X4 + 1 * rnorm(obs))


#grow a forest, remeber to include inbag
rfo=randomForest(X,Y,keep.inbag = TRUE,sampsize=1500,ntree=500)

#compute topology, Rectree is excuted within forestFloor.
#See source-code of forestFloor function to for more details.
ff = forestFloor(rfo,X)

#print forestFloor
print(ff) 

#plot partial functions of most important variables first
plot(ff)

Run the code above in your browser using DataLab