Learn R Programming

h2o (version 2.8.1.1)

h2o.SpeeDRF: H2O: Single-Node Random Forest

Description

Performs single-node random forest classification on a data set.

Usage

h2o.SpeeDRF(x, y, data, key = "", classification = TRUE, nfolds = 0, validation, 
  mtries = -1, ntree = 50, depth = 20, sample.rate = 2/3, oobee = TRUE, 
  importance = FALSE, nbins = 1024, seed = -1, stat.type = "ENTROPY", 
  balance.classes = FALSE, verbose = FALSE)

Arguments

x
A vector containing the names or indices of the predictor variables to use in building the random forest model.
y
The name or index of the response variable. If the data does not contain a header, this is the column index, designated by increasing numbers from left to right. (The response must be either an integer or a categorical variable).
data
An H2OParsedData object containing the variables in the model.
key
(Optional) The unique hex key assigned to the resulting model. If none is given, a key will automatically be generated.
classification
(Optional) A logical value indicating whether a classification model should be built (as opposed to regression).
nfolds
(Optional) Number of folds for cross-validation. If nfolds >= 2, then validation must remain empty.
validation
(Optional) An H2OParsedData object indicating the validation dataset used to construct confusion matrix. If left blank, this defaults to the training data when nfolds = 0.
mtries
(Optional) Number of features to randomly select at each split in the tree. If set to the default of -1, this will be set to sqrt(ncol(data)), rounded down to the nearest integer.
ntree
(Optional) Number of trees to grow. (Must be a nonnegative integer).
depth
(Optional) Maximum depth to grow the tree.
sample.rate
(Optional) Sampling rate for constructing data from which individual trees are grown.
oobee
(Optional) A logical value indicating whether to calculate the out of bag error estimate.
importance
(Optional) A logical value indicating whether to compute variable importance measures. (If set to TRUE, the algorithm will take longer to finish.)
nbins
(Optional) Build a histogram of this many bins, then split at best point.
seed
(Optional) Seed for building the random forest. If seed = -1, one will automatically be generated by H2O.
stat.type
(Optional) Type of statistic to use, equal to either "ENTROPY" or "GINI".
balance.classes
(Optional) A logical value indicating whether classes should be rebalanced. Use for datasets where the levels of the response class are very unbalanced.
verbose
(Optional) A logical value indicating whether verbose results should be returned.

Value

  • An object of class H2OSpeeDRFModel with slots key, data, valid (the validation dataset), and model, where the last is a list of the following components:
  • paramsInput parameters for building the model.
  • ntreeNumber of trees grown.
  • depthDepth of the trees grown.
  • nbinsNumber of bins used in building the histogram.
  • classificationLogical value indicating if the model is classification.
  • mseMean-squared error for each tree.
  • confusionConfusion matrix of the prediction.

Details

IMPORTANT: Currently, you must initialize H2O with the flag beta = TRUE in h2o.init in order to use this method!

This method runs random forest model building on a single node, as opposed to the multi-node implementation in h2o.randomForest.

See Also

H2OSpeeDRFModel, h2o.randomForest

Examples

Run this code
library(h2o)
localH2O = h2o.init()
irisPath = system.file("extdata", "iris.csv", package = "h2o")
iris.hex = h2o.importFile(localH2O, path = irisPath, key = "iris.hex")
h2o.SpeeDRF(x = c(2,3,4), y = 5, data = iris.hex, ntree = 50, depth = 100)

Run the code above in your browser using DataLab