train
functiontrainControl(method = "boot",
number = ifelse(method %in% c("cv", "repeatedcv"), 10, 25),
repeats = ifelse(method %in% c("cv", "repeatedcv"), 1, number),
p = 0.75,
initialWindow = NULL,
horizon = 1,
fixedWindow = TRUE,
verboseIter = FALSE,
returnData = TRUE,
returnResamp = "final",
savePredictions = FALSE,
classProbs = FALSE,
summaryFunction = defaultSummary,
selectionFunction = "best",
preProcOptions = list(thresh = 0.95, ICAcomp = 3, k = 5),
index = NULL,
indexOut = NULL,
timingSamps = 0,
predictionBounds = rep(FALSE, 2),
seeds = NA,
adaptive = list(min = 5, alpha = 0.05, method = "gls"),
allowParallel = TRUE)
boot
, boot632
, cv
, repeatedcv
,
LOOCV
, LGOCV
(for repeated training/test splits), none
(only fits one model to the entire training set), createTimeSlices
defaultSummary
.best
for details and other options.preProcess
. The type of pre-processing (e.g. center, scaling etc) is passed in via the preProc
option in train
.index
) that dictates which sample are held-out for each resample. If NULL
, then the unique set of samples not contained in index
is used.c(TRUE, FALSE)
would only constrain the lower end of predicNA
will stop the seed from being set within the worker processes while a value of method
is "adaptive_cv"
, "adaptive_boot"
or "adaptive_LGOCV"
. See Details below.train
does some optimizations for certain models. For example, when tuning over PLS model, the only model that is fit is the one with the largest number of components. So if the model is being tuned over comp in 1:10
, the only model fit is ncomp = 10
. However, if the vector of integers used in the seeds
arguments is longer than actually needed, no error is thrown. Using method = "none"
and specifying model than one model in train
's tuneGrid
or tuneLength
arguments will result in an error.
Using adaptive resampling when method
is either "adaptive_cv"
, "adaptive_boot"
or "adaptive_LGOCV"
, the full set of resamples is not run for each model. As resampling continues, a futility analysis is conducted and models with a low probability of being optimal are removed. These features are experimental. See Kuhn (2014) for more details. The options for this procedure are:
method = "gls"
) or a Bradley-Terry model (method = "BT"
)## Do 5 repeats of 10-Fold CV for the iris data. We will fit
## a KNN model that evaluates 12 values of k and set the seed
## at each iteration.
set.seed(123)
seeds <- vector(mode = "list", length = 51)
for(i in 1:50) seeds[[i]] <- sample.int(1000, 22)
## For the last model:
seeds[[51]] <- sample.int(1000, 1)
ctrl <- trainControl(method = "repeatedcv",
repeats = 5,
seeds = seeds)
set.seed(1)
mod <- train(Species ~ ., data = iris,
method = "knn",
tuneLength = 12,
trControl = ctrl)
Run the code above in your browser using DataLab