Usage
h2o.SpeeDRF(x, y, data, classification = TRUE, validation, mtry = -1, ntree = 50,
depth = 50, sample.rate = 2/3, oobee = TRUE, importance = FALSE, nbins = 1024,
seed = -1, stat.type = "ENTROPY", classwt = NULL, sampling_strategy = "RANDOM",
strata_samples = NULL)
Arguments
x
A vector containing the names or indices of the predictor variables to use in building the random forest model.
y
The name or index of the response variable. If the data does not contain a header, this is the column index, designated by increasing numbers from left to right. (The response must be either an integer or a categorical variable).
data
An H2OParsedData
object containing the variables in the model.
classification
(Optional) A logical value indicating whether a classification model should be built (as opposed to regression).
validation
(Optional) An H2OParsedData
object indicating the validation dataset used to construct confusion matrix. If left blank, this defaults to the training data.
mtry
(Optional) Number of features to randomly select at each split in the tree. If set to the default of -1, this will be set to sqrt(ncol(data))
, rounded down to the nearest integer.
ntree
(Optional) Number of trees to grow. (Must be a nonnegative integer).
depth
(Optional) Maximum depth to grow the tree.
sample.rate
(Optional) Sampling rate for constructing data from which individual trees are grown.
oobee
(Optional) A logical value indicating whether to calculate the out of bag error estimate.
importance
(Optional) A logical value indicating whether to compute variable importance measures. (If set to TRUE
, the algorithm will take longer to finish.)
nbins
(Optional) Build a histogram of this many bins, then split at best point.
seed
(Optional) Seed for building the random forest. If seed = -1
, one will automatically be generated by H2O.
stat.type
(Optional) Type of statistic to use, equal to either "ENTROPY" or "GINI".
classwt
(Optional) Numeric vector of class weights for a categorical response.
sampling_strategy
(Optional) Sampling strategy to use, equal to either "RANDOM" or "STRATIFIED_LOCAL".
strata_samples
(Optional) A numeric sequence indicating the strata for sampling. Only used if sampling_strategy = "STRATIFIED_LOCAL"
.