oneSE: Selecting tuning Parameters

Description

Various functions for setting tuning parameters

Usage

oneSE(x, metric, num, maximize)
tolerance(x, metric, tol = 1.5, maximize)

Arguments

a data frame of tuning parameters and model results, sorted from least complex models to the mst complex

metric

a string that specifies what summary metric will be used to select the optimal model. By default, possible values are "RMSE" and "Rsquared" for regression and "Accuracy" and "Kappa" for classification. If custom performance metrics are used (via the summaryFunction argument in trainControl, the value of metric should match one of the arguments. If it does not, a warning is issued and the first metric given by the summaryFunction is used.

num

the number of resamples (for oneSE only)

maximize

a logical: should the metric be maximized or minimized?

tol

the acceptable percent tolerance (for tolerance only)

Value

a row index

Details

These functions can be used by train to select the "optimal" model from a series of models. Each requires the user to select a metric that will be used to judge performance. For regression models, values of "RMSE" and "Rsquared" are applicable. Classification models use either "Accuracy" or "Kappa" (for unbalanced class distributions.

More details on these functions can be found at http://topepo.github.io/caret/model-training-and-tuning.html#custom.

By default, train uses best.

best simply chooses the tuning parameter associated with the largest (or lowest for "RMSE") performance.

oneSE is a rule in the spirit of the "one standard error" rule of Breiman et al. (1984), who suggest that the tuning parameter associated with the best performance may over fit. They suggest that the simplest model within one standard error of the empirically optimal model is the better choice. This assumes that the models can be easily ordered from simplest to most complex (see the Details section below).

tolerance takes the simplest model that is within a percent tolerance of the empirically optimal model. For example, if the largest Kappa value is 0.5 and a simpler model within 3 percent is acceptable, we score the other models using (x - 0.5)/0.5 * 100. The simplest model whose score is not less than 3 is chosen (in this case, a model with a Kappa value of 0.35 is acceptable).

User--defined functions can also be used. The argument selectionFunction in trainControl can be used to pass the function directly or to pass the function by name.

References

Breiman, Friedman, Olshen, and Stone. (1984) Classification and Regression Trees. Wadsworth.

Examples

Run this code

# NOT RUN {
# }
# NOT RUN {
# simulate a PLS regression model
test <- data.frame(ncomp = 1:5,
                   RMSE = c(3, 1.1, 1.02, 1, 2),
                   RMSESD = .4)

best(test, "RMSE", maximize = FALSE)
oneSE(test, "RMSE", maximize = FALSE, num = 10)
tolerance(test, "RMSE", tol = 3, maximize = FALSE)

### usage example

data(BloodBrain)

marsGrid <- data.frame(degree = 1, nprune = (1:10) * 3)

set.seed(1)
marsFit <- train(bbbDescr, logBBB,
                 method = "earth",
                 tuneGrid = marsGrid,
                 trControl = trainControl(method = "cv",
                                          number = 10,
                                          selectionFunction = "tolerance"))

# around 18 terms should yield the smallest CV RMSE
# }
# NOT RUN {

# }

Run the code above in your browser using DataLab