h2o.isolationForest: Trains an Isolation Forest model

Description

Trains an Isolation Forest model

Usage

h2o.isolationForest(training_frame, x, model_id = NULL,
  score_each_iteration = FALSE, score_tree_interval = 0,
  ignore_const_cols = TRUE, ntrees = 50, max_depth = 8,
  min_rows = 1, max_runtime_secs = 0, seed = -1,
  build_tree_one_node = FALSE, mtries = -1, sample_size = 256,
  sample_rate = -1, col_sample_rate_change_per_level = 1,
  col_sample_rate_per_tree = 1, categorical_encoding = c("AUTO",
  "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen",
  "LabelEncoder", "SortByResponse", "EnumLimited"),
  export_checkpoints_dir = NULL)

Arguments

training_frame

Id of the training data frame.

A vector containing the character names of the predictors in the model.

model_id

Destination id for this model; auto-generated if not specified.

score_each_iteration

Logical. Whether to score during each iteration of model training. Defaults to FALSE.

score_tree_interval

Score the model after every so many trees. Disabled if set to 0. Defaults to 0.

ignore_const_cols

Logical. Ignore constant columns. Defaults to TRUE.

ntrees

Number of trees. Defaults to 50.

max_depth

Maximum tree depth. Defaults to 8.

min_rows

Fewest allowed (weighted) observations in a leaf. Defaults to 1.

max_runtime_secs

Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0.

seed

Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default) Defaults to -1 (time-based random number).

build_tree_one_node

Logical. Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets. Defaults to FALSE.

mtries

Number of variables randomly sampled as candidates at each split. If set to -1, defaults (number of predictors)/3. Defaults to -1.

sample_size

Number of randomly sampled observations used to train each Isolation Forest tree. Only one of parameters sample_size and sample_rate should be defined. If sample_rate is defined, sample_size will be ignored. Defaults to 256.

sample_rate

Rate of randomly sampled observations used to train each Isolation Forest tree. Needs to be in range from 0.0 to 1.0. If set to -1, sample_rate is disabled and sample_size will be used instead. Defaults to -1.

col_sample_rate_change_per_level

Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0) Defaults to 1.

col_sample_rate_per_tree

Column sample rate per tree (from 0.0 to 1.0) Defaults to 1.

categorical_encoding

Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO.

export_checkpoints_dir

Automatically export generated models to this directory.