The technical details of this method are described in Kuhn (2014).
Racing methods are efficient approaches to grid search. Initially, the
function evaluates all tuning parameters on a small initial set of
resamples. The burn_in
argument of control_race()
sets the number of
initial resamples.
The performance statistics from these resamples are analyzed to determine
which tuning parameters are not statistically different from the current
best setting. If a parameter is statistically different, it is excluded from
further resampling.
The next resample is used with the remaining parameter combinations and the
statistical analysis is updated. More candidate parameters may be excluded
with each new resample that is processed.
This function determines statistical significance using a repeated measures ANOVA
model where the performance statistic (e.g., RMSE, accuracy, etc.) is the
outcome data and the random effect is due to resamples. The
control_race()
function contains are parameter for the significance cutoff
applied to the ANOVA results as well as other relevant arguments.
There is benefit to using racing methods in conjunction with parallel
processing. The following section shows a benchmark of results for one
dataset and model.
Censored regression models
With dynamic performance metrics (e.g. Brier or ROC curves), performance is
calculated for every value of eval_time
but the first evaluation time
given by the user (e.g., eval_time[1]
) is analyzed during racing.
Also, values of eval_time
should be less than the largest observed event
time in the training data. For many non-parametric models, the results beyond
the largest time corresponding to an event are constant (or NA
).
Benchmarking results
To demonstrate, we use a SVM model with the kernlab
package.
library(kernlab)
library(tidymodels)
library(finetune)
library(doParallel)## -----------------------------------------------------------------------------
data(cells, package = "modeldata")
cells <- cells %>% select(-case)
## -----------------------------------------------------------------------------
set.seed(6376)
rs <- bootstraps(cells, times = 25)
We’ll only tune the model parameters (i.e., not recipe tuning):
## -----------------------------------------------------------------------------svm_spec <-
svm_rbf(cost = tune(), rbf_sigma = tune()) %>%
set_engine("kernlab") %>%
set_mode("classification")
svm_rec <-
recipe(class ~ ., data = cells) %>%
step_YeoJohnson(all_predictors()) %>%
step_normalize(all_predictors())
svm_wflow <-
workflow() %>%
add_model(svm_spec) %>%
add_recipe(svm_rec)
set.seed(1)
svm_grid <-
svm_spec %>%
parameters() %>%
grid_latin_hypercube(size = 25)
We’ll get the times for grid search and ANOVA racing with and without
parallel processing:
## -----------------------------------------------------------------------------
## Regular grid searchsystem.time({
set.seed(2)
svm_wflow %>% tune_grid(resamples = rs, grid = svm_grid)
})
## user system elapsed
## 741.660 19.654 761.357
## -----------------------------------------------------------------------------
## With racingsystem.time({
set.seed(2)
svm_wflow %>% tune_race_anova(resamples = rs, grid = svm_grid)
})
## user system elapsed
## 133.143 3.675 136.822
Speed-up of 5.56-fold for racing.
## -----------------------------------------------------------------------------
## Parallel processing setupcores <- parallel::detectCores(logical = FALSE)
cores
## [1] 10
cl <- makePSOCKcluster(cores)
registerDoParallel(cl)
## -----------------------------------------------------------------------------
## Parallel grid searchsystem.time({
set.seed(2)
svm_wflow %>% tune_grid(resamples = rs, grid = svm_grid)
})
## user system elapsed
## 1.112 0.190 126.650
Parallel processing with grid search was 6.01-fold faster than
sequential grid search.
## -----------------------------------------------------------------------------
## Parallel racingsystem.time({
set.seed(2)
svm_wflow %>% tune_race_anova(resamples = rs, grid = svm_grid)
})
## user system elapsed
## 1.908 0.261 21.442
Parallel processing with racing was 35.51-fold faster than sequential
grid search.
There is a compounding effect of racing and parallel processing but its
magnitude depends on the type of model, number of resamples, number of
tuning parameters, and so on.