
Last chance! 50% off unlimited learning
Sale ends in
best(x, metric, maximize)
oneSE(x, metric, num, maximize)
tolerance(x, metric, tol = 1.5, maximize)
oneSE
only)tolerance
only)train
to select the "optimal" model form a series of models. Each requires the user to select a metric that will be used to judge performance. For regression models, values of "RMSE"
and "Rsquared"
are applicable. Classification models use either "Accuracy"
or "Kappa"
(for unbalanced class distributions.By default, train
uses best
.
best
simply chooses the tuning parameter associated with the largest (or lowest for "RMSE"
) performance.
oneSE
is a rule in the spirit of the "one standard error" rule of Breiman et al (1984), who suggest that the tuning parameter associated with eh best performance may over fit. They suggest that the simplest model within one standard error of the empirically optimal model is the better choice. This assumes that the models can be easily ordered from simplest to most complex (see the Details section below).
tolerance
takes the simplest model that is within a percent tolerance of the empirically optimal model. For example, if the largest Kappa value is 0.5 and a simpler model within 3 percent is acceptable, we score the other models using (x - 0.5)/0.5 * 100
. The simplest model whose score is not less than 3 is chosen (in this case, a model with a Kappa value of 0.35 is acceptable).
User--defined functions can also be used. The argument selectionFunction
in trainControl
can be used to pass the function directly or to pass the funciton by name.
train
, trainControl
# simulate a PLS regression model
test <- data.frame(
ncomp = 1:5,
RMSE = c(3, 1.1, 1.02, 1, 2),
RMSESD = .4)
best(test, "RMSE", maximize = FALSE)
oneSE(test, "RMSE", maximize = FALSE, num = 10)
tolerance(test, "RMSE", tol = 3, maximize = FALSE)
### usage example
data(BloodBrain)
marsGrid <- data.frame(
.degree = 1,
.nprune = (1:10) * 3)
set.seed(1)
marsFit <- train(
bbbDescr, logBBB,
"earth",
tuneGrid = marsGrid,
trControl = trainControl(
method = "cv",
number = 10,
selectionFunction = "tolerance"))
# around 18 terms should yield the smallest CV RMSE
Run the code above in your browser using DataLab