h2o.SpeeDRF: H2O: Single-Node Random Forest

Description

Performs single-node random forest classification on a data set.

Usage

h2o.SpeeDRF(x, y, data, key = "", classification = TRUE, nfolds = 0, validation,
  holdout.fraction = 0, mtries = -1, ntree = 50, depth = 20, sample.rate = 2/3,
  oobee = TRUE, importance = FALSE, nbins = 1024, seed = -1,
  stat.type = "ENTROPY", balance.classes = FALSE, verbose = FALSE)

Arguments

A vector containing the names or indices of the predictor variables to use in building the random forest model.

The name or index of the response variable. If the data does not contain a header, this is the column index, designated by increasing numbers from left to right. (The response must be either an integer or a categorical variable).

data

An H2OParsedData object containing the variables in the model.

key

(Optional) The unique hex key assigned to the resulting model. If none is given, a key will automatically be generated.

classification

(Optional) A logical value indicating whether a classification model should be built (as opposed to regression).

nfolds

(Optional) Number of folds for cross-validation. If nfolds >= 2, then validation must remain empty.

validation

(Optional) An H2OParsedData object indicating the validation dataset used to construct confusion matrix. If left blank, this defaults to the training data when nfolds = 0.

holdout.fraction

(Optional) Fraction of the training data to hold out for validation.

mtries

(Optional) Number of features to randomly select at each split in the tree. If set to the default of -1, this will be set to sqrt(ncol(data)), rounded down to the nearest integer.

ntree

(Optional) Number of trees to grow. (Must be a nonnegative integer).

depth

(Optional) Maximum depth to grow the tree.

sample.rate

(Optional) Sampling rate for constructing data from which individual trees are grown.

oobee

(Optional) A logical value indicating whether to calculate the out of bag error estimate.

importance

(Optional) A logical value indicating whether to compute variable importance measures. (If set to TRUE, the algorithm will take longer to finish.)

nbins

(Optional) Build a histogram of this many bins, then split at best point.

seed

(Optional) Seed for building the random forest. If seed = -1, one will automatically be generated by H2O.

stat.type

(Optional) Type of statistic to use, equal to either "ENTROPY" or "GINI" or "TWOING".

balance.classes

(Optional) A logical value indicating whether classes should be rebalanced. Use for datasets where the levels of the response class are very unbalanced.

verbose

(Optional) A logical value indicating whether verbose results should be returned.

Value

An object of class H2OSpeeDRFModel with slots key, data, valid (the validation dataset), and model, where the last is a list of the following components:
paramsInput parameters for building the model.
ntreeNumber of trees grown.
depthDepth of the trees grown.
nbinsNumber of bins used in building the histogram.
classificationLogical value indicating if the model is classification.
mseMean-squared error for each tree.
confusionConfusion matrix of the prediction.

Details

IMPORTANT: Currently, you must initialize H2O with the flag beta = TRUE in h2o.init in order to use this method!

This method runs random forest model building on a single node, as opposed to the multi-node implementation in h2o.randomForest.

Examples

Run this code

library(h2o)
localH2O = h2o.init()
irisPath = system.file("extdata", "iris.csv", package = "h2o")
iris.hex = h2o.importFile(localH2O, path = irisPath, key = "iris.hex")
h2o.SpeeDRF(x = c(2,3,4), y = 5, data = iris.hex, ntree = 50, depth = 100)