- df
A data.frame containing a column of unique ids, a column of classes, and an arbitrary number of numeric columns.
- ids_col
A character value for the name of the ids column.
- class_col
A character value for the name of the class column.
- vali_pct
A numeric value for the percentage of training data to use as validation data: 0.15 (default).
- test_pct
A numeric value for the percentage of total data to use as test data: 0.15 (default).
- scale
A logical value for whether to scale the data: FALSE (default). Recommended if method = "ae" and if user intends to train other models.
- center
Either a logical value or numeric-alike vector of length equal to the number of columns of data to scale in df, where ‘numeric-alike’ means that as.numeric(.) will be applied successfully if is.numeric(.) is not true: NULL (default). If scale = TRUE, this is set to TRUE and is used to subtract the mean.
- sd
Either a logical value or a numeric-alike vector of length equal to the number of columns of data to scale in df: NULL (default). If scale = TRUE, this is set to TRUE and is used to divide by the standard deviation.
- split_seed
A numeric value to set the seed and control the randomness of splitting the data: 123 (default).
- method
A character value for the dimensionality reduction method: "pca" (default), "ae", "none".
- pca_pct
If method = "pca", a numeric value for the proportion of variance to subset the PCA with: 0.95 (default).
- max_mem_size
If method = "ae", a character value for the memory of an h2o session: "15g" (default).
- port
A numeric value for the port number of the H2O server.
- train_seed
A numeric value to set the control the randomness of creating resamples: 123 (default).
- hyper_params
A list of hyperparameters to perform a grid search. the "default" list is: list(missing_values_handling = "Skip", activation = c("Rectifier", "Tanh"), hidden = list(5, 25, 50, 100, 250, 500, nrow(df_h2o)), input_dropout_ratio = c(0, 0.1, 0.2, 0.3), rate = c(0, 0.01, 0.005, 0.001)).
- search
If method = "ae", a character value for the hyperparameter search strategy: "random" (default), "grid".
- tune_length
If method = "ae", a numeric value (integer) for either the maximum number of hyperparameter combinations ("random") or individual hyperparameter depth ("grid").