trainControl
Control parameters for train
Control the computational nuances of the train
function
 Keywords
 utilities
Usage
trainControl(method = "boot", number = ifelse(grepl("cv", method), 10, 25), repeats = ifelse(grepl("cv", method), 1, number), p = 0.75, search = "grid", initialWindow = NULL, horizon = 1, fixedWindow = TRUE, verboseIter = FALSE, returnData = TRUE, returnResamp = "final", savePredictions = FALSE, classProbs = FALSE, summaryFunction = defaultSummary, selectionFunction = "best", preProcOptions = list(thresh = 0.95, ICAcomp = 3, k = 5), sampling = NULL, index = NULL, indexOut = NULL, indexFinal = NULL, timingSamps = 0, predictionBounds = rep(FALSE, 2), seeds = NA, adaptive = list(min = 5, alpha = 0.05, method = "gls", complete = TRUE), trim = FALSE, allowParallel = TRUE)
Arguments
 method
 The resampling method:
"boot"
,"boot632"
,"cv"
,"repeatedcv"
,"LOOCV"
,"LGOCV"
(for repeated training/test splits),"none"
(only fits one model to the entire training set),"oob"
(only for random forest, bagged trees, bagged earth, bagged flexible discriminant analysis, or conditional tree forest models),"adaptive_cv"
,"adaptive_boot"
or"adaptive_LGOCV"
 number
 Either the number of folds or number of resampling iterations
 repeats
 For repeated kfold crossvalidation only: the number of complete sets of folds to compute
 verboseIter
 A logical for printing a training log.
 returnData
 A logical for saving the data
 returnResamp
 A character string indicating how much of the resampled summary metrics should be saved. Values can be
"final"
,"all"
or"none"
 savePredictions
 an indicator of how much of the holdout predictions for each resample should be saved. Values can be either
"all"
,"final"
, or"none"
. A logical value can also be used that convert to"all"
(for true) or"none"
(for false)."final"
saves the predictions for the optimal tuning parameters.  p
 For leavegroup out crossvalidation: the training percentage
 search
 Either
"grid"
or"random"
, describing how the tuning parameter grid is determined. See details below.  initialWindow, horizon, fixedWindow
 possible arguments to
createTimeSlices
 classProbs
 a logical; should class probabilities be computed for classification models (along with predicted values) in each resample?
 summaryFunction
 a function to compute performance metrics across resamples. The arguments to the function should be the same as those in
defaultSummary
.  selectionFunction
 the function used to select the optimal tuning parameter. This can be a name of the function or the function itself. See
best
for details and other options.  preProcOptions
 A list of options to pass to
preProcess
. The type of preprocessing (e.g. center, scaling etc) is passed in via thepreProc
option intrain
.  sampling
 a single character value describing the type of additional sampling that is conducted after resampling (usually to resolve class imbalances). Values are
"none"
,"down"
,"up"
,"smote"
, or"rose"
. The latter two values require the DMwR and ROSE packages, respectively. This argument can also be a list to facilitate custom sampling and these details can be found on the caret package website for sampling (link below).  index
 a list with elements for each resampling iteration. Each list element is a vector of integers corresponding to the rows used for training at that iteration.
 indexOut
 a list (the same length as
index
) that dictates which data are heldout for each resample (as integers). IfNULL
, then the unique set of samples not contained inindex
is used.  indexFinal
 an optional vector of integers indicating which samples are used to fit the final model after resampling. If
NULL
, then entire data set is used.  timingSamps
 the number of training set samples that will be used to measure the time for predicting samples (zero indicates that the prediction time should not be estimated.
 predictionBounds
 a logical or numeric vector of length 2 (regression only). If logical, the predictions can be constrained to be within the limit of the training set outcomes. For example, a value of
c(TRUE, FALSE)
would only constrain the lower end of predictions. If numeric, specific bounds can be used. For example, ifc(10, NA)
, values below 10 would be predicted as 10 (with no constraint in the upper side).  seeds
 an optional set of integers that will be used to set the seed at each resampling iteration. This is useful when the models are run in parallel. A value of
NA
will stop the seed from being set within the worker processes while a value ofNULL
will set the seeds using a random set of integers. Alternatively, a list can be used. The list should haveB+1
elements whereB
is the number of resamples, unlessmethod
is"boot632"
in which caseB
is the number of resamples plus 1. The firstB
elements of the list should be vectors of integers of lengthM
whereM
is the number of models being evaluated. The last element of the list only needs to be a single integer (for the final model). See the Examples section below and the Details section.  adaptive
 a list used when
method
is"adaptive_cv"
,"adaptive_boot"
or"adaptive_LGOCV"
. See Details below.  trim
 a logical. If
TRUE
the final model inobject\$finalModel
may have some components of the object removed so reduce the size of the saved object. Thepredict
method will still work, but some other features of the model may not work.trim
ing will occur only for models where this feature has been implemented.  allowParallel
 if a parallel backend is loaded and available, should the function use it?
Details
When setting the seeds manually, the number of models being evaluated is required. This may not be obvious as train
does some optimizations for certain models. For example, when tuning over PLS model, the only model that is fit is the one with the largest number of components. So if the model is being tuned over comp in 1:10
, the only model fit is ncomp = 10
. However, if the vector of integers used in the seeds
arguments is longer than actually needed, no error is thrown.
Using method = "none"
and specifying more than one model in train
's tuneGrid
or tuneLength
arguments will result in an error.
Using adaptive resampling when method
is either "adaptive_cv"
, "adaptive_boot"
or "adaptive_LGOCV"
, the full set of resamples is not run for each model. As resampling continues, a futility analysis is conducted and models with a low probability of being optimal are removed. These features are experimental. See Kuhn (2014) for more details. The options for this procedure are:

min
: the minimum number of resamples used before models are removed 
alpha
: the confidence level of the onesided intervals used to measure futility 
method
: either generalized least squares (method = "gls"
) or a BradleyTerry model (method = "BT"
) 
complete
: if a single parameter value is found before the end of resampling, should the full set of resamples be computed for that parameter. )
The option search = "grid"
uses the default grid search routine. When search = "random"
, a random search procedure is used (Bergstra and Bengio, 2012). See http://topepo.github.io/caret/random.html for details and an example.
The "boot632"
method uses the 0.632 estimator presented in Efron (1983), not to be confused with the 0.632+ estimator proposed later by the same author.
Value

An echo of the parameters specified
References
Efron (1983). ``Estimating the error rate of a prediction rule: improvement on crossvalidation''. Journal of the American Statistical Association, 78(382):316331
Bergstra and Bengio (2012), ``Random Search for HyperParameter Optimization'', Journal of Machine Learning Research, 13(Feb):281305
Kuhn (2014), ``Futility Analysis in the CrossValidation of Machine Learning Models'' http://arxiv.org/abs/1405.6974,
Package website for subsampling: http://topepo.github.io/caret/sampling.html
Examples
## Not run:
#
# ## Do 5 repeats of 10Fold CV for the iris data. We will fit
# ## a KNN model that evaluates 12 values of k and set the seed
# ## at each iteration.
#
# set.seed(123)
# seeds < vector(mode = "list", length = 51)
# for(i in 1:50) seeds[[i]] < sample.int(1000, 22)
#
# ## For the last model:
# seeds[[51]] < sample.int(1000, 1)
#
# ctrl < trainControl(method = "repeatedcv",
# repeats = 5,
# seeds = seeds)
#
# set.seed(1)
# mod < train(Species ~ ., data = iris,
# method = "knn",
# tuneLength = 12,
# trControl = ctrl)
#
#
# ctrl2 < trainControl(method = "adaptive_cv",
# repeats = 5,
# verboseIter = TRUE,
# seeds = seeds)
#
# set.seed(1)
# mod2 < train(Species ~ ., data = iris,
# method = "knn",
# tuneLength = 12,
# trControl = ctrl2)
#
# ## End(Not run)
Community examples
I am getting below error while submitting a text x = trainControl(method = "repeatedcv", number = numbers, repeats = repeats, classProbs = TRUE, summaryFunction = twoClassSummary) Error: Please suggesrt Error in trainControl(method = "repeatedcv", number = numbers, repeats = repeats, : could not find function "trainControl"