Learn R Programming

Rborist (version 0.1-0)

Rborist: Rapid Decision Tree Construction and Evaluation

Description

Accelerated implementation of the Random Forest (trademarked name) algorithm. Tuned for multicore and GPU hardware. Works with most numerical front-end languages in addtion to R. Invocation is similar to that provided by "randomForest" package.

Usage

## S3 method for class 'default':
Rborist(x, y, nTree=500, withRepl = TRUE,
                predProb = ifelse(!is.factor(y), 0.4, sqrt(ncol(x))/ncol(x)),
                predWeight = rep(1.0, ncol(x)),
                nSamp = ifelse(withRepl, nrow(x), round((1-exp(-1))*nrow(x))),
                minNode = ifelse(!is.factor(y), 6, 2),
                nLevel = 0,
                minInfo = 0.01,
                sampleWeight = NULL,
                quantVec = NULL,
                quantiles = !is.null(quantVec),
                pvtBlock = 8,
                pvtNoPredict = FALSE, ...)

Arguments

x
the design matrix. Can be either a numerical matrix or a data frame of numerical or categorical columns.
y
the response vector. Can be either numerical or categorical. Hence x and y should have the same number of rows.
nTree
the number of trees on which to train.
withRepl
whether row sampling is by replacement.
predProb
the probability that an arbitrary predictor is chosen as a splitting candidate. Similar in spirit to the mtry option of randomForest, which is a fixed number of predictors over which to attempt to split at each node.
predWeight
a numerical vector of weights for selecting individual predictors as splitting candidates.
nSamp
the number of samples to draw when training an individual tree.
minNode
the minimal index set width over which to attempt splitting. Similar to nodesize parameter of randomForest.
nLevel
the number of levels at which to terminate. At least one level of the decision tree is built. The value 0 is reserved to stipulate no pre-set limit.
minInfo
a threshold ratio of information values between a subnode and its parent, below which splitting will not be attempted.
sampleWeight
a numerical vector of weights for drawing individual rows as sample candidates.
quantVec
vector of quantiles over which to report quantile regression.
quantiles
whether to train with quantile information. If quantVec is not specified but quantiles has been specified, then quartiles are employed by default.
pvtBlock
blocking parameter for performance tuning. Currently unsupported.
pvtNoPredict
skips prediction. Meant for testing only.
...
not currently used.

Value

  • An object of class Rborist. Contains a forest object, for subsequent prediction, as well as measure of prediction quality on the out-of-bag samples. For regression, a mean-square error rate and R-squared values are reported in mse and rsq, respectively. For categorical response, the misprediction rate and confusion matrices are reported misprediction and confusion, respectively.

    If quantile training has been requested, a quantile object, for subsequent quantile regression.

concept

decision trees