internal C++ functions to compute feature contributions for a random Forest
recTree( vars, obs, ntree, calculate_node_pred, X,Y,majorityTerminal, leftDaughter,
rightDaughter, nodestatus, xbestsplit, nodepred, bestvar,
inbag, varLevels, OOBtimes, localIncrements)multiTree(vars, obs, ntree, nClasses, X,Y,majorityTerminal, leftDaughter,
rightDaughter, nodestatus, xbestsplit, nodepred, bestvar,
inbag, varLevels, OOBtimes, localIncrements)
number of variables in X
number of observations in X
number of trees starting from 1 function should iterate, cannot be higher than columns of inbag
number of classes in classification forest
should the node predictions be recalculated(true) or reused from nodepred-matrix(false & regression)
X training matrix
target vector, factor or regression
bool, majority vote in terminal nodes? Default is FALSE for regression. Set only to TRUE when binary_reg=TRUE.
a matrix from a the output of randomForest rf$forest$leftDaughter the node.number/row.number of the leftDaughter in a given tree by column
a matrix from a the output of randomForest rf$forest$rightDaughter the node.number/row.number of the rightDaughter in a given tree by column
a matrix from a the output of randomForest rf$forest$nodestatus the nodestatus of a given node in a given tree
a matrix from a the output of randomForest rf$forest$xbestsplit. The split point of numeric variables or the binary split of categorical variables. See help file of randomForest::getTree for details of binary expansion for categorical splits.
a matrix from a the output of randomForest rf$forest$xbestsplit. The inbag target average for regression mode and the majority target class for classification
a matrix from a the output of randomForest rf$forest$xbestsplit the inbag target average for regression mode and the majority target class for classification
a matrix as the output of randomForest rf$inbag. Contain counts of how many times a sample was selected for a given tree.
the number of levels of all variables, 1 for continuous or discrete, >1 for categorical variables. This is needed for categorical variables to interpret binary split from xbestsplit.
number of times a certain observation was out-of-bag in the forest. Needed to compute cross-validated feature contributions as these are summed local increments over out-of-bag observations over features divided by this number. In previous implementation(rfFC), articles(see references) feature contributions are summed by all observations and is divived by ntrees.
an empty matrix to store localIncrements during computation. As C++ function returns, the input localIncrement matrix contains the feature contributions.
no output, the feature contributions are writtten directly to localIncrements input
This is function is excuted by the function forestFloor. This is a c++/Rcpp implementation computing feature contributions. The main differences from this implementation and the rfFC-package(Rforge), is that these feature contributions are only summed over out-of-bag samples yields a cross-validation. This implementation allows sample replacement, binary and multi-classification.
Interpretation of QSAR Models Based on Random Forest Methods, http://dx.doi.org/10.1002/minf.201000173 Interpreting random forest classification models using a feature contribution method, http://arxiv.org/abs/1312.1121