stat.split: Acquire Split Statistic Information

Description

Extract split statistic information from the forest. The function returns a list of length ntree, in which each element corresponds to a tree. The element [[b]] is itself a vector of length xvar.names identified by its x-variable name. Each element [[b]]$xvar contains the complete list of splits on xvar with associated identifying information. The information is as follows:

treeIDTree identifier.
nodeIDNode identifier.
parmIDVariable indentifier.
contPTValue node was split in the case of a continuous variable.
mwcpSZSize of the multi-word complementary pair in the case of a factor split.
dpthIDZero (0) based depth of split.
spltTYSplit type for parent node:lll{ bit 1 bit 0 meaning ----- ----- ------- 0 0 0 = both daughters have valid splits 0 1 1 = only the right daughter is terminal 1 0 2 = only the left daughter is terminal 1 1 3 = both daughters are terminal

spltEC End cut statistic for real valued variables between [0,0.5] that is small when the split is towards the edge and large when the split is towards the middle. Subtracting this value from 0.5 yields the end cut statistic studied in Ishwaran (2014) and is a way to identify ECP behavior (end cut preference behavior).

spltST Split statistic:

For objects of class (rfsrc, grow), this is the split statistic that resulted in the variable being choosen for the split.
For an object of class (rfsrc, pred) this is the variance of the response within the node for the test data. This value is relevant only for real valued responses. In classification and survival, it is not relevant.

## S3 method for class 'rfsrc': stat.split(object, ...) object{An object of class (rfsrc, grow), (rfsrc, synthetic) or

(rfsrc,
	predict)

} ...{Further arguments passed to or from other methods.} Invisibly, a list with the following components: ...{...} [object Object],[object Object] Ishwaran H. (2014). The effect of splitting on random forests. Machine Learning (in press). ## run a forest, then make a call to stat.split grow.obj <- rfsrc(mpg ~., data = mtcars, statistics=TRUE) stat.obj <- stat.split(grow.obj)

## nice wrapper to extract split-statistic for desired variable ## for continuous variables plots ECP data get.split <- function(splitObj, xvar, inches = 0.1, ...) { which.var <- which(names(splitObj[[1]]) == xvar) ntree <- length(splitObj) stat <- data.frame(do.call(rbind, sapply(1:ntree, function(b) { splitObj[[b]][which.var] }))) dpth <- stat$dpthID ecp <- 1/2 - stat$spltEC sp <- stat$contPT if (!all(is.na(sp))) { fgC <- function(x) { as.numeric(as.character(cut(x, breaks = c(-1, 0.2, 0.35, 0.5), labels = c(1, 4, 2)))) } symbols(jitter(sp), jitter(dpth), ecp, inches = inches, bg = fgC(ecp), xlab = xvar, ylab = "node depth", ...) legend("topleft", legend = c("low ecp", "med ecp", "high ecp"), fill = c(1, 4, 2)) } invisible(stat) }

## use get.split to investigate ECP behavior of variables get.split(stat.obj, "disp") splitting behavior

Description

Arguments