trainControl(method = "boot", number = ifelse(grepl("cv", method), 10, 25), repeats = ifelse(grepl("cv", method), 1, number), p = 0.75, search = "grid", initialWindow = NULL, horizon = 1, fixedWindow = TRUE, verboseIter = FALSE, returnData = TRUE, returnResamp = "final", savePredictions = FALSE, classProbs = FALSE, summaryFunction = defaultSummary, selectionFunction = "best", preProcOptions = list(thresh = 0.95, ICAcomp = 3, k = 5), sampling = NULL, index = NULL, indexOut = NULL, indexFinal = NULL, timingSamps = 0, predictionBounds = rep(FALSE, 2), seeds = NA, adaptive = list(min = 5, alpha = 0.05, method = "gls", complete = TRUE), trim = FALSE, allowParallel = TRUE)
"LGOCV"(for repeated training/test splits),
"none"(only fits one model to the entire training set),
"oob"(only for random forest, bagged trees, bagged earth, bagged flexible discriminant analysis, or conditional tree forest models),
"none". A logical value can also be used that convert to
"all"(for true) or
"final"saves the predictions for the optimal tuning parameters.
"random", describing how the tuning parameter grid is determined. See details below.
bestfor details and other options.
"rose". The latter two values require the DMwR and ROSE packages, respectively. This argument can also be a list to facilitate custom sampling and these details can be found on the caret package website for sampling (link below).
index) that dictates which data are held-out for each resample (as integers). If
NULL, then the unique set of samples not contained in
NULL, then entire data set is used.
c(TRUE, FALSE)would only constrain the lower end of predictions. If numeric, specific bounds can be used. For example, if
c(10, NA), values below 10 would be predicted as 10 (with no constraint in the upper side).
NAwill stop the seed from being set within the worker processes while a value of
NULLwill set the seeds using a random set of integers. Alternatively, a list can be used. The list should have
Bis the number of resamples, unless
"boot632"in which case
Bis the number of resamples plus 1. The first
Belements of the list should be vectors of integers of length
Mis the number of models being evaluated. The last element of the list only needs to be a single integer (for the final model). See the Examples section below and the Details section.
"adaptive_LGOCV". See Details below.
TRUEthe final model in
object\$finalModelmay have some components of the object removed so reduce the size of the saved object. The
predictmethod will still work, but some other features of the model may not work.
triming will occur only for models where this feature has been implemented.
traindoes some optimizations for certain models. For example, when tuning over PLS model, the only model that is fit is the one with the largest number of components. So if the model is being tuned over
comp in 1:10, the only model fit is
ncomp = 10. However, if the vector of integers used in the
seedsarguments is longer than actually needed, no error is thrown.
method = "none" and specifying more than one model in
tuneLength arguments will result in an error.
Using adaptive resampling when
method is either
"adaptive_LGOCV", the full set of resamples is not run for each model. As resampling continues, a futility analysis is conducted and models with a low probability of being optimal are removed. These features are experimental. See Kuhn (2014) for more details. The options for this procedure are:
min: the minimum number of resamples used before models are removed
alpha: the confidence level of the one-sided intervals used to measure futility
method: either generalized least squares (
method = "gls") or a Bradley-Terry model (
method = "BT")
complete: if a single parameter value is found before the end of resampling, should the full set of resamples be computed for that parameter. )
search = "grid" uses the default grid search routine. When
search = "random", a random search procedure is used (Bergstra and Bengio, 2012). See http://topepo.github.io/caret/random.html for details and an example.
"boot632" method uses the 0.632 estimator presented in Efron (1983), not to be confused with the 0.632+ estimator proposed later by the same author.
Bergstra and Bengio (2012), ``Random Search for Hyper-Parameter Optimization'', Journal of Machine Learning Research, 13(Feb):281-305
Kuhn (2014), ``Futility Analysis in the Cross-Validation of Machine Learning Models'' http://arxiv.org/abs/1405.6974,
Package website for subsampling: http://topepo.github.io/caret/sampling.html
## Not run: # # ## Do 5 repeats of 10-Fold CV for the iris data. We will fit # ## a KNN model that evaluates 12 values of k and set the seed # ## at each iteration. # # set.seed(123) # seeds <- vector(mode = "list", length = 51) # for(i in 1:50) seeds[[i]] <- sample.int(1000, 22) # # ## For the last model: # seeds[] <- sample.int(1000, 1) # # ctrl <- trainControl(method = "repeatedcv", # repeats = 5, # seeds = seeds) # # set.seed(1) # mod <- train(Species ~ ., data = iris, # method = "knn", # tuneLength = 12, # trControl = ctrl) # # # ctrl2 <- trainControl(method = "adaptive_cv", # repeats = 5, # verboseIter = TRUE, # seeds = seeds) # # set.seed(1) # mod2 <- train(Species ~ ., data = iris, # method = "knn", # tuneLength = 12, # trControl = ctrl2) # # ## End(Not run)