Various functions for setting tuning parameters
oneSE(x, metric, num, maximize)tolerance(x, metric, tol = 1.5, maximize)
a data frame of tuning parameters and model results, sorted from least complex models to the mst complex
a string that specifies what summary metric will be used to
select the optimal model. By default, possible values are "RMSE" and
"Rsquared" for regression and "Accuracy" and "Kappa" for classification. If
custom performance metrics are used (via the summaryFunction
argument
in trainControl
, the value of metric
should match one
of the arguments. If it does not, a warning is issued and the first metric
given by the summaryFunction
is used.
the number of resamples (for oneSE
only)
a logical: should the metric be maximized or minimized?
the acceptable percent tolerance (for tolerance
only)
a row index
These functions can be used by train
to select the "optimal"
model from a series of models. Each requires the user to select a metric
that will be used to judge performance. For regression models, values of
"RMSE"
and "Rsquared"
are applicable. Classification models
use either "Accuracy"
or "Kappa"
(for unbalanced class
distributions.
More details on these functions can be found at http://topepo.github.io/caret/model-training-and-tuning.html#custom.
By default, train
uses best
.
best
simply chooses the tuning parameter associated with the largest
(or lowest for "RMSE"
) performance.
oneSE
is a rule in the spirit of the "one standard error" rule of
Breiman et al. (1984), who suggest that the tuning parameter associated with
the best performance may over fit. They suggest that the simplest model
within one standard error of the empirically optimal model is the better
choice. This assumes that the models can be easily ordered from simplest to
most complex (see the Details section below).
tolerance
takes the simplest model that is within a percent tolerance
of the empirically optimal model. For example, if the largest Kappa value is
0.5 and a simpler model within 3 percent is acceptable, we score the other
models using (x - 0.5)/0.5 * 100
. The simplest model whose score is
not less than 3 is chosen (in this case, a model with a Kappa value of 0.35
is acceptable).
User--defined functions can also be used. The argument
selectionFunction
in trainControl
can be used to pass
the function directly or to pass the function by name.
Breiman, Friedman, Olshen, and Stone. (1984) Classification and Regression Trees. Wadsworth.
# NOT RUN { # } # NOT RUN { # simulate a PLS regression model test <- data.frame(ncomp = 1:5, RMSE = c(3, 1.1, 1.02, 1, 2), RMSESD = .4) best(test, "RMSE", maximize = FALSE) oneSE(test, "RMSE", maximize = FALSE, num = 10) tolerance(test, "RMSE", tol = 3, maximize = FALSE) ### usage example data(BloodBrain) marsGrid <- data.frame(degree = 1, nprune = (1:10) * 3) set.seed(1) marsFit <- train(bbbDescr, logBBB, method = "earth", tuneGrid = marsGrid, trControl = trainControl(method = "cv", number = 10, selectionFunction = "tolerance")) # around 18 terms should yield the smallest CV RMSE # } # NOT RUN { # }
Run the code above in your browser using DataCamp Workspace