Usage
h2o.gbm(x, y, training_frame, model_id, distribution = c("AUTO", "gaussian",
"bernoulli", "multinomial"), ntrees = 50, max_depth = 5, min_rows = 10,
learn_rate = 0.1, nbins = 20, nbins_cats = 1024,
validation_frame = NULL, balance_classes = FALSE,
max_after_balance_size = 1, seed, nfolds, score_each_iteration, ...)
Arguments
x
A vector containing the names or indices of the predictor variables to use in building the GBM model.
y
The name or index of the response variable. If the data does not contain a header, this is the column index
number starting at 0, and increasing from left to right. (The response must be either an integer or a
categorical variable).
training_frame
An H2OFrame
object containing the variables in the model.
model_id
(Optional) The unique id assigned to the resulting model. If
none is given, an id will automatically be generated.
distribution
A character
string. The loss function to be implemented.
Must be "AUTO", "bernoulli", "multinomial", or "gaussian"
ntrees
A nonnegative integer that determines the number of trees to grow.
max_depth
Maximum depth to grow the tree.
min_rows
Minimum number of rows to assign to teminal nodes.
learn_rate
An interger
from 0.0
to 1.0
nbins
For numerical columns (real/int), build a histogram of this many bins, then split at the best point
nbins_cats
For categorical columns (enum), build a histogram of this many bins, then split at the best point. Higher values can lead to more overfitting.
validation_frame
An H2OFrame
object indicating the validation dataset used to contruct the
confusion matrix. If left blank, this defaults to the training data when nfolds = 0
balance_classes
logical, indicates whether or not to balance training data class
counts via over/under-sampling (for imbalanced data)
max_after_balance_size
Maximum relative size of the training data after balancing class counts (can be less
than 1.0)
seed
Seed for random numbers (affects sampling) - Note: only reproducible when running single threaded
nfolds
(Optional) Number of folds for cross-validation. If nfolds >= 2
, then validation
must remain empty. **Currently not supported**
score_each_iteration
Attempts to score each tree.
...
extra arguments to pass on (currently no implemented)