h2o4gpu.random_forest_classifier: Random Forest Classifier

Description

Random Forest Classifier

Usage

h2o4gpu.random_forest_classifier(n_estimators = 10L, criterion = "gini",
  max_depth = 3L, min_samples_split = 2L, min_samples_leaf = 1L,
  min_weight_fraction_leaf = 0, max_features = "auto",
  max_leaf_nodes = NULL, min_impurity_decrease = 0,
  min_impurity_split = NULL, bootstrap = TRUE, oob_score = FALSE,
  n_jobs = 1L, random_state = NULL, verbose = 0L, warm_start = FALSE,
  class_weight = NULL, subsample = 1, colsample_bytree = 1,
  num_parallel_tree = 1L, tree_method = "gpu_hist", n_gpus = -1L,
  predictor = "gpu_predictor", backend = "h2o4gpu")

Arguments

n_estimators

The number of trees in the forest.

criterion

The function to measure the quality of a split. Supported criteria are "gini" for the Gini impurity and "entropy" for the information gain. Note: this parameter is tree-specific.

max_depth

The maximum depth of the tree. If NULL, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_split

The minimum number of samples required to split an internal node:

min_samples_leaf

The minimum number of samples required to be at a leaf node:

min_weight_fraction_leaf

The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

max_features

The number of features to consider when looking for the best split:

max_leaf_nodes

Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If NULL then unlimited number of leaf nodes.

min_impurity_decrease

A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

min_impurity_split

Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.

bootstrap

Whether bootstrap samples are used when building trees.

oob_score

whether to use out-of-bag samples to estimate the R^2 on unseen data.

n_jobs

The number of jobs to run in parallel for both fit and predict. If -1, then the number of jobs is set to the number of cores.

random_state

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If NULL, the random number generator is the RandomState instance used by np.random.

verbose

Controls the verbosity of the tree building process.

warm_start

When set to TRUE, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest.

class_weight

"balanced_subsample" or NULL, optional (default=NULL) Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.

subsample

Subsample ratio of the training instance.

colsample_bytree

Subsample ratio of columns when constructing each tree.

num_parallel_tree

Number of trees to grow per round

tree_method

The tree construction algorithm used in XGBoost Distributed and external memory version only support approximate algorithm. Choices: <U+2018>auto<U+2019>, <U+2018>exact<U+2019>, <U+2018>approx<U+2019>, <U+2018>hist<U+2019>, <U+2018>gpu_exact<U+2019>, <U+2018>gpu_hist<U+2019> <U+2018>auto<U+2019>: Use heuristic to choose faster one. - For small to medium dataset, exact greedy will be used. - For very large-dataset, approximate algorithm will be chosen. - Because old behavior is always use exact greedy in single machine, - user will get a message when approximate algorithm is chosen to notify this choice. <U+2018>exact<U+2019>: Exact greedy algorithm. <U+2018>approx<U+2019>: Approximate greedy algorithm using sketching and histogram. <U+2018>hist<U+2019>: Fast histogram optimized approximate greedy algorithm. It uses some performance improvements such as bins caching. <U+2018>gpu_exact<U+2019>: GPU implementation of exact algorithm. <U+2018>gpu_hist<U+2019>: GPU implementation of hist algorithm.

n_gpus

Number of gpu's to use in RandomForestClassifier solver. Default is -1.

predictor

The type of predictor algorithm to use. Provides the same results but allows the use of GPU or CPU. - 'cpu_predictor': Multicore CPU prediction algorithm. - 'gpu_predictor': Prediction using GPU. Default for 'gpu_exact' and 'gpu_hist' tree method.

backend

Which backend to use. Options are 'auto', 'sklearn', 'h2o4gpu'. Saves as attribute for actual backend used.