rfeControl
Controlling the Feature Selection Algorithms
This function generates a control object that can be used to specify the details of the feature selection algorithms used in this package.
 Keywords
 utilities
Usage
rfeControl(functions = NULL, rerank = FALSE, method = "boot", saveDetails = FALSE, number = ifelse(method %in% c("cv", "repeatedcv"), 10, 25), repeats = ifelse(method %in% c("cv", "repeatedcv"), 1, number), verbose = FALSE, returnResamp = "final", p = .75, index = NULL, indexOut = NULL, timingSamps = 0, seeds = NA, allowParallel = TRUE)
Arguments
 functions
 a list of functions for model fitting, prediction and variable importance (see Details below)
 rerank
 a logical: should variable importance be recalculated each time features are removed?
 method
 The external resampling method:
boot
,cv
,LOOCV
orLGOCV
(for repeated training/test splits  number
 Either the number of folds or number of resampling iterations
 repeats
 For repeated kfold crossvalidation only: the number of complete sets of folds to compute
 saveDetails
 a logical to save the predictions and variable importances from the selection process
 verbose
 a logical to print a log for each external resampling iteration
 returnResamp
 A character string indicating how much of the resampled summary metrics should be saved. Values can be ``final'', ``all'' or ``none''
 p
 For leavegroup out crossvalidation: the training percentage
 index
 a list with elements for each external resampling iteration. Each list element is the sample rows used for training at that iteration.
 indexOut
 a list (the same length as
index
) that dictates which sample are heldout for each resample. IfNULL
, then the unique set of samples not contained inindex
is used.  timingSamps
 the number of training set samples that will be used to measure the time for predicting samples (zero indicates that the prediction time should not be estimated).
 seeds
 an optional set of integers that will be used to set the seed at each resampling iteration. This is useful when the models are run in parallel. A value of
NA
will stop the seed from being set within the worker processes while a value ofNULL
will set the seeds using a random set of integers. Alternatively, a list can be used. The list should haveB+1
elements whereB
is the number of resamples. The firstB
elements of the list should be vectors of integers of lengthP
whereP
is the number of subsets being evaluated (including the full set). The last element of the list only needs to be a single integer (for the final model). See the Examples section below.  allowParallel
 if a parallel backend is loaded and available, should the function use it?
Details
More details on this function can be found at http://topepo.github.io/caret/featureselection.html#rfe.
Backwards selection requires function to be specified for some operations.
The fit
function builds the model based on the current data set. The arguments for the function must be:
x
the current training set of predictor data with the appropriate subset of variablesy
the current outcome data (either a numeric or factor vector)first
a single logical value for whether the current predictor set has all possible variableslast
similar tofirst
, butTRUE
when the last model is fit with the final subset size and predictors....
optional arguments to pass to the fit function in the call torfe
The function should return a model object that can be used to generate predictions.
The pred
function returns a vector of predictions (numeric or factors) from the current model. The arguments are:
object
the model generated by thefit
functionx
the current set of predictor set for the heldback samples
The rank
function is used to return the predictors in the order of the most important to the least important. Inputs are:
object
the model generated by thefit
functionx
the current set of predictor set for the training samplesy
the current training outcomes
The function should return a data frame with a column called var
that has the current variable names. The first row should be the most important predictor etc. Other columns can be included in the output and will be returned in the final rfe
object.
The selectSize
function determines the optimal number of predictors based on the resampling output. Inputs for the function are:
x
a matrix with columns for the performance metrics and the number of variables, called "Variables
"metric
a character string of the performance measure to optimize (e.g. "RMSE", "Rsquared", "Accuracy" or "Kappa")maximize
a single logical for whether the metric should be maximized
This function should return an integer corresponding to the optimal subset size. caret comes with two examples functions for this purpose: pickSizeBest
and pickSizeTolerance
.
After the optimal subset size is determined, the selectVar
function will be used to calculate the best rankings for each variable across all the resampling iterations. Inputs for the function are:
y
a list of variables importance for each resampling iteration and each subset size (generated by the userdefinedrank
function). In the example, each each of the crossvalidation groups the output of therank
function is saved for each of the subset sizes (including the original subset). If the rankings are not recomputed at each iteration, the values will be the same within each crossvalidation iteration.size
the integer returned by theselectSize
function
This function should return a character string of predictor names (of length size
) in the order of most important to least important
Examples of these functions are included in the package: lmFuncs
, rfFuncs
, treebagFuncs
and nbFuncs
.
Model details about these functions, including examples, are at http://topepo.github.io/caret/featureselection.html. .
Value

A list
See Also
rfe
, lmFuncs
, rfFuncs
, treebagFuncs
, nbFuncs
, pickSizeBest
, pickSizeTolerance
Examples
## Not run:
# subsetSizes < c(2, 4, 6, 8)
# set.seed(123)
# seeds < vector(mode = "list", length = 51)
# for(i in 1:50) seeds[[i]] < sample.int(1000, length(subsetSizes) + 1)
# seeds[[51]] < sample.int(1000, 1)
#
# set.seed(1)
# rfMod < rfe(bbbDescr, logBBB,
# sizes = subsetSizes,
# rfeControl = rfeControl(functions = rfFuncs,
# seeds = seeds,
# number = 50))
# ## End(Not run)