- df
 
Dataframe. Dataframe containing all your data, including
the independent variable labeled as 'tag'. If you want to define
which variable should be used instead, use the y parameter.
- y
 
Variable or Character. Name of the independent variable.
- ignore
 
Character vector. Force columns for the model to ignore
- train_test
 
Character. If needed, df's column name with 'test'
and 'train' values to split
- split
 
Numeric. Value between 0 and 1 to split as train/test
datasets. Value is for training set. Set value to 1 to train with all
available data and test with same data (cross-validation will still be
used when training). If train_test is set, value will be overwritten
with its real split rate.
- weight
 
Column with observation weights. Giving some observation a
weight of zero is equivalent to excluding it from the dataset; giving an
observation a relative weight of 2 is equivalent to repeating that
row twice. Negative weights are not allowed.
- target
 
Value. Which is your target positive value? If
set to 'auto', the target with largest mean(score) will be
selected. Change the value to overwrite. Only used when binary
categorical model.
- balance
 
Boolean. Auto-balance train dataset with under-sampling?
- impute
 
Boolean. Fill NA values with MICE?
- no_outliers
 
Boolean/Numeric. Remove y's outliers from the dataset?
Will remove those values that are farther than n standard deviations from
the independent variable's mean (Z-score). Set to TRUE for default (3)
or numeric to set a different multiplier.
- unique_train
 
Boolean. Keep only unique row observations for training data?
- center, scale
 
Boolean. Using the base function scale, do you wish
to center and/or scale all numerical values?
- thresh
 
Integer. Threshold for selecting binary or regression
models: this number is the threshold of unique values we should
have in 'tag' (more than: regression; less than: classification)
- seed
 
Integer. Set a seed for reproducibility. AutoML can only
guarantee reproducibility if max_models is used because max_time is
resource limited.
- nfolds
 
Number of folds for k-fold cross-validation. Must be >= 2; defaults to 5. Use 0 to disable cross-validation;
this will also disable Stacked Ensemble (thus decreasing the overall model performance).
- max_models, max_time
 
Numeric. Max number of models and seconds
you wish for the function to iterate. Note that max_models guarantees
reproducibility and max_time not (because it depends entirely on your
machine's computational characteristics)
- start_clean
 
Boolean. Erase everything in the current h2o
instance before we start to train models? You may want to keep other models
or not. To group results into a custom common AutoML project, you may
use project_name argument.
- exclude_algos, include_algos
 
Vector of character strings. Algorithms
to skip or include during the model-building phase. Set NULL to ignore.
When both are defined, only include_algos will be valid.
- plots
 
Boolean. Create plots objects?
- alarm
 
Boolean. Ping (sound) when done. Requires beepr.
- quiet
 
Boolean. Quiet all messages, warnings, recommendations?
- print
 
Boolean. Print summary when process ends?
- save
 
Boolean. Do you wish to save/export results into your
working directory?
- subdir
 
Character. In which directory do you wish to save
the results? Working directory as default.
- project
 
Character. Your project's name
- verbosity
 
Verbosity of the backend messages printed during training; Optional.
Must be one of NULL (live log disabled), "debug", "info", "warn", "error". Defaults to "warn".
- ...
 
Additional parameters on h2o::h2o.automl
- x
 
h2o_automl object
- importance
 
Boolean. Print important variables?