Perform hyper-parameter tuning using either K-fold cross validation or train-validation split.
ml_cross_validator(x, estimator, estimator_param_maps, evaluator,
num_folds = 3L, seed = NULL, uid = random_string("cross_validator_"),
...)ml_train_validation_split(x, estimator, estimator_param_maps, evaluator,
train_ratio = 0.75, seed = NULL,
uid = random_string("train_validation_split_"), ...)
A spark_connection
, ml_pipeline
, or a tbl_spark
.
A ml_estimator
object.
A named list of stages and hyper-parameter sets to tune. See details.
A ml_evaluator
object, see ml_evaluator.
Number of folds for cross validation. Must be >= 2. Default: 3
A random seed. Set this value if you need your results to be reproducible across repeated calls.
A character string used to uniquely identify the ML estimator.
Optional arguments; currently unused.
Ratio between train and validation data. Must be between 0 and 1. Default: 0.75
The object returned depends on the class of x
.
spark_connection
: When x
is a spark_connection
, the function returns an instance of a ml_cross_validator
or ml_traing_validation_split
object.
ml_pipeline
: When x
is a ml_pipeline
, the function returns a ml_pipeline
with
the tuning estimator appended to the pipeline.
tbl_spark
: When x
is a tbl_spark
, a tuning estimator is constructed then
immediately fit with the input tbl_spark
, returning a ml_cross_validation_model
or a
ml_train_validation_split_model
object.
ml_cross_validator()
performs k-fold cross validation while ml_train_validation_split()
performs tuning on one pair of train and validation datasets.