This function lets the user create a robust and fast model, using H2O's AutoML function. The result is a list with the best model, its parameters, datasets, performance metrics, variables importances, and other useful metrics.
h2o_automl(df, y = "tag", ignore = c(), train_test = NA,
split = 0.7, weight = NULL, balance = FALSE, impute = FALSE,
center = FALSE, scale = FALSE, seed = 0, nfolds = 5,
thresh = 5, max_time = 5 * 60, max_models = 10,
start_clean = TRUE, exclude_algos = c("StackedEnsemble",
"DeepLearning"), plots = TRUE, alarm = TRUE, quiet = FALSE,
save = FALSE, subdir = NA, project = "ML Project")
Dataframe. Dataframe containing all your data, including the independent variable labeled as 'tag'. If you want to define which variable should be used instead, use the y parameter.
Character. Name of the independent variable
Character vector. Force columns for the model to ignore
Character. If needed, df's column name with 'test' and 'train' values to split
Numeric. Value between 0 and 1 to split as train/test datasets. Value is for training set.
Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
Boolean. Auto-balance train dataset with under-sampling?
Boolean. Fill NA values with MICE?
Boolean. Using the base function scale, do you wish to center and/or scale all numerical values?
Integer. Set a seed for reproducibility. AutoML can only guarantee reproducibility if max_models is used because max_time is resource limited.
Integer. Number of folds for k-fold cross-validation of the models. If set to 0, the test data will be used as validation, and cross-validation amd Stacked Ensembles disableded
Integer. Threshold for selecting binary or regression models: this number is the threshold of unique values we should have in 'tag' (more than: regression; less than: classification)
Numeric. Max seconds you wish for the function to iterate
Numeric. Max models you wish for the function to create
Boolean. Erase everything in the current h2o instance before we start to train models?
Vector of character strings. Algorithms to skip during the model-building phase. Set NULL to use all
Boolean. Create plots objects?
Boolean. Ping an alarm when ready!
Boolean. Quiet messages, warnings, recommendations?
Boolean. Do you wish to save/export results into your working directory?
Character. In which directory do you wish to save the results? Working directory as default.
Character. Your project's name
Full list of algorithms: "DRF" (Distributed Random Forest, including Random Forest (RF) and Extremely-Randomized Trees (XRT)), "GLM" (Generalized Linear Model), "XGBoost" (eXtreme Grading Boosting), "GBM" (Gradient Boosting Machine), "DeepLearning" (Fully-connected multi-layer artificial neural network) and "StackedEnsemble". Read more: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html
Other Machine Learning: ROC
,
clusterKmeans
, conf_mat
,
export_results
, gain_lift
,
h2o_predict_API
,
h2o_predict_MOJO
,
h2o_predict_binary
,
h2o_predict_model
,
h2o_selectmodel
, impute
,
iter_seeds
, model_metrics
,
mplot_conf
, mplot_cuts_error
,
mplot_cuts
, mplot_density
,
mplot_full
, mplot_gain
,
mplot_importance
,
mplot_lineal
, mplot_metrics
,
mplot_response
, mplot_roc
,
mplot_splits
, msplit