- df
A data.frame containing a column of ids, a column of classes, and an arbitrary number of predictors.
- ids_col
A character value for the name of the ids column.
- class_col
A character value for the name of the class column.
- method
A character value for the anomaly detection method: "ae" (default), "iso" (abbv. for extended isolation forest).
- scale
A logical value for whether to scale the data: FALSE (default). Recommended if method = "ae" and if user intends to train other models.
- center
Either a logical value or numeric-alike vector of length equal to the number of columns of data to scale in df, where ‘numeric-alike’ means that as.numeric(.) will be applied successfully if is.numeric(.) is not true: NULL (default). If scale = TRUE, this is set to TRUE and is used to subtract the mean.
- sd
Either a logical value or a numeric-alike vector of length equal to the number of columns of data to scale in df: NULL (default). If scale = TRUE, this is set to TRUE and is used to divide by the standard deviation.
- max_mem_size
A character value for the memory of an h2o session: "15g" (default).
- port
A numeric value for the port number of the H2O server.
- train_seed
A numeric value to set the control the randomness of creating resamples: 123 (default).
- hyper_params
A list of hyperparameters to perform a grid search.
If method = "ae", the "default" list is: list(missing_values_handling = "Skip", activation = c("Rectifier", "Tanh"), hidden = list(5, 25, 50, 100, 250, 500, nrow(df_h2o)), input_dropout_ratio = c(0, 0.1, 0.2, 0.3), rate = c(0, 0.01, 0.005, 0.001))
If method = "iso", the "default" list is: list(ntrees = c(50, 100, 150, 200), sample_size = c(64, 128, 256, 512))
- search
A character value for the hyperparameter search strategy: "random" (default), "grid".
- tune_length
A numeric value (integer) for either the maximum number of hyperparameter combinations ("random") or individual hyperparameter depth ("grid"): 100 (default).