Random Forest Classifier
h2o4gpu.random_forest_classifier(n_estimators = 10L, criterion = "gini",
max_depth = 3L, min_samples_split = 2L, min_samples_leaf = 1L,
min_weight_fraction_leaf = 0, max_features = "auto",
max_leaf_nodes = NULL, min_impurity_decrease = 0,
min_impurity_split = NULL, bootstrap = TRUE, oob_score = FALSE,
n_jobs = 1L, random_state = NULL, verbose = 0L, warm_start = FALSE,
class_weight = NULL, subsample = 1, colsample_bytree = 1,
num_parallel_tree = 1L, tree_method = "gpu_hist", n_gpus = -1L,
predictor = "gpu_predictor", backend = "h2o4gpu")
The number of trees in the forest.
The function to measure the quality of a split. Supported criteria are "gini" for the Gini impurity and "entropy" for the information gain. Note: this parameter is tree-specific.
The maximum depth of the tree. If NULL, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
The minimum number of samples required to split an internal node:
The minimum number of samples required to be at a leaf node:
The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
The number of features to consider when looking for the best split:
Grow trees with max_leaf_nodes
in best-first fashion. Best nodes are defined as relative reduction in impurity. If NULL then unlimited number of leaf nodes.
A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.
Whether bootstrap samples are used when building trees.
whether to use out-of-bag samples to estimate the R^2 on unseen data.
The number of jobs to run in parallel for both fit
and predict
. If -1, then the number of jobs is set to the number of cores.
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If NULL, the random number generator is the RandomState instance used by np.random
.
Controls the verbosity of the tree building process.
When set to TRUE
, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest.
"balanced_subsample" or NULL, optional (default=NULL) Weights associated with classes in the form {class_label: weight}
. If not given, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.
Subsample ratio of the training instance.
Subsample ratio of columns when constructing each tree.
Number of trees to grow per round
The tree construction algorithm used in XGBoost Distributed and external memory version only support approximate algorithm. Choices: <U+2018>auto<U+2019>, <U+2018>exact<U+2019>, <U+2018>approx<U+2019>, <U+2018>hist<U+2019>, <U+2018>gpu_exact<U+2019>, <U+2018>gpu_hist<U+2019> <U+2018>auto<U+2019>: Use heuristic to choose faster one. - For small to medium dataset, exact greedy will be used. - For very large-dataset, approximate algorithm will be chosen. - Because old behavior is always use exact greedy in single machine, - user will get a message when approximate algorithm is chosen to notify this choice. <U+2018>exact<U+2019>: Exact greedy algorithm. <U+2018>approx<U+2019>: Approximate greedy algorithm using sketching and histogram. <U+2018>hist<U+2019>: Fast histogram optimized approximate greedy algorithm. It uses some performance improvements such as bins caching. <U+2018>gpu_exact<U+2019>: GPU implementation of exact algorithm. <U+2018>gpu_hist<U+2019>: GPU implementation of hist algorithm.
Number of gpu's to use in RandomForestClassifier solver. Default is -1.
The type of predictor algorithm to use. Provides the same results but allows the use of GPU or CPU. - 'cpu_predictor': Multicore CPU prediction algorithm. - 'gpu_predictor': Prediction using GPU. Default for 'gpu_exact' and 'gpu_hist' tree method.
Which backend to use. Options are 'auto', 'sklearn', 'h2o4gpu'. Saves as attribute for actual backend used.