Usage
h2o.randomForest(x, y, training_frame, model_id, validation_frame,
mtries = -1, sample_rate = 0.632, build_tree_one_node = FALSE,
ntrees = 50, max_depth = 20, min_rows = 1, nbins = 20,
nbins_cats = 1024, binomial_double_trees = TRUE,
balance_classes = FALSE, max_after_balance_size = 5, seed, ...)
Arguments
x
A vector containing the names or indices of the predictor variables
to use in building the GBM model.
y
The name or index of the response variable. If the data does not
contain a header, this is the column index number starting at 1, and
increasing from left to right. (The response must be either an integer
or a categorical variable).
training_frame
An H2OFrame
object containing the
variables in the model.
model_id
(Optional) The unique id assigned to the resulting model. If
none is given, an id will automatically be generated.
validation_frame
An H2OFrame
object containing the variables in the model.
mtries
Number of variables randomly sampled as candidates at each split.
If set to -1, defaults to sqrt{p} for classification, and p/3 for regression,
where p is the number of predictors.
sample_rate
Sample rate, from 0 to 1.0.
build_tree_one_node
Run on one node only; no network overhead but
fewer cpus used. Suitable for small datasets.
ntrees
A nonnegative integer that determines the number of trees to
grow.
max_depth
Maximum depth to grow the tree.
min_rows
Minimum number of rows to assign to teminal nodes.
nbins
For numerical columns (real/int), build a histogram of this many bins, then split at the best point.
nbins_cats
For categorical columns (enum), build a histogram of this many bins, then split at the best point.
Higher values can lead to more overfitting.
binomial_double_trees
For binary classification: Build 2x as many trees (one per class) - can lead to higher accuracy.
balance_classes
logical, indicates whether or not to balance training
data class counts via over/under-sampling (for imbalanced data)
max_after_balance_size
Maximum relative size of the training data after balancing class counts (can be less
than 1.0)
seed
Seed for random numbers (affects sampling) - Note: only
reproducible when running single threaded
...
(Currently Unimplemented)