Learn R Programming

h2o (version 3.0.0.22)

h2o.randomForest: Build a Big Data Random Forest Model

Description

Builds a Random Forest Model on an H2OFrame

Usage

h2o.randomForest(x, y, training_frame, model_id, validation_frame,
  mtries = -1, sample_rate = 0.632, build_tree_one_node = FALSE,
  ntrees = 50, max_depth = 20, min_rows = 1, nbins = 20,
  nbins_cats = 1024, binomial_double_trees = TRUE,
  balance_classes = FALSE, max_after_balance_size = 5, seed, ...)

Arguments

x
A vector containing the names or indices of the predictor variables to use in building the GBM model.
y
The name or index of the response variable. If the data does not contain a header, this is the column index number starting at 1, and increasing from left to right. (The response must be either an integer or a categorical variable).
training_frame
An H2OFrame object containing the variables in the model.
model_id
(Optional) The unique id assigned to the resulting model. If none is given, an id will automatically be generated.
validation_frame
An H2OFrame object containing the variables in the model.
mtries
Number of variables randomly sampled as candidates at each split. If set to -1, defaults to sqrt{p} for classification, and p/3 for regression, where p is the number of predictors.
sample_rate
Sample rate, from 0 to 1.0.
build_tree_one_node
Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets.
ntrees
A nonnegative integer that determines the number of trees to grow.
max_depth
Maximum depth to grow the tree.
min_rows
Minimum number of rows to assign to teminal nodes.
nbins
For numerical columns (real/int), build a histogram of this many bins, then split at the best point.
nbins_cats
For categorical columns (enum), build a histogram of this many bins, then split at the best point. Higher values can lead to more overfitting.
binomial_double_trees
For binary classification: Build 2x as many trees (one per class) - can lead to higher accuracy.
balance_classes
logical, indicates whether or not to balance training data class counts via over/under-sampling (for imbalanced data)
max_after_balance_size
Maximum relative size of the training data after balancing class counts (can be less than 1.0)
seed
Seed for random numbers (affects sampling) - Note: only reproducible when running single threaded
...
(Currently Unimplemented)

Value

  • Creates a H2OModel object of the right type.

See Also

predict.H2OModel for prediction.