show_best: Investigate best tuning parameters

Description

show_best() displays the top sub-models and their performance estimates.

Usage

show_best(x, metric = NULL, n = 5, ...)
select_best(x, metric = NULL, ...)
select_by_pct_loss(x, ..., metric = NULL, limit = 2)
select_by_one_std_err(x, ..., metric = NULL)

Arguments

The results of tune_grid() or tune_bayes().

metric

A character value for the metric that will be used to sort the models. (See https://yardstick.tidymodels.org/articles/metric-types.html for more details). Not required if a single metric exists in x. If there are multiple metric and none are given, the first in the metric set is used (and a warning is issued).

An integer for the number of top results/rows to return.

...

For select_by_one_std_err() and select_by_pct_loss(), this argument is passed directly to dplyr::arrange() so that the user can sort the models from most simple to most complex. See the examples below. At least one term is required for these two functions.

limit

The limit of loss of performance that is acceptable (in percent units). See details below.

Value

A tibble with columns for the parameters. show_best() also includes columns for performance metrics.

Details

select_best() finds the tuning parameter combination with the best performance values.

select_by_one_std_err() uses the "one-standard error rule" (Breiman _el at, 1984) that selects the most simple model that is within one standard error of the numerically optimal results.

select_by_pct_loss() selects the most simple model whose loss of performance is within some acceptable limit.

For percent loss, suppose the best model has an RMSE of 0.75 and a simpler model has an RMSE of 1. The percent loss would be (1.00 - 0.75)/1.00 * 100, or 25 percent. Note that loss will always be non-negative.

References

Breiman, Leo; Friedman, J. H.; Olshen, R. A.; Stone, C. J. (1984). Classification and Regression Trees. Monterey, CA: Wadsworth.

Examples

Run this code

# NOT RUN {
data("example_ames_knn")

show_best(ames_iter_search, metric = "rmse")

select_best(ames_iter_search, metric = "rsq")

# To find the least complex model within one std error of the numerically
# optimal model, the number of nearest neighbors are sorted from the largest
# number of neighbors (the least complex class boundary) to the smallest
# (corresponding to the most complex model).

select_by_one_std_err(ames_grid_search, metric = "rmse", desc(K))

# Now find the least complex model that has no more than a 5% loss of RMSE:
select_by_pct_loss(ames_grid_search, metric = "rmse",
                   limit = 5, desc(K))
# }

Run the code above in your browser using DataLab