safsControl
Control parameters for GA and SA feature selection
Control the computational nuances of the gafs
and safs
functions
 Keywords
 utilities
Usage
gafsControl(functions = NULL, method = "repeatedcv", metric = NULL, maximize = NULL, number = ifelse(grepl("cv", method), 10, 25), repeats = ifelse(grepl("cv", method), 1, 5), verbose = FALSE, returnResamp = "final", p = 0.75, index = NULL, indexOut = NULL, seeds = NULL, holdout = 0, genParallel = FALSE, allowParallel = TRUE)
safsControl(functions = NULL, method = "repeatedcv", metric = NULL, maximize = NULL, number = ifelse(grepl("cv", method), 10, 25), repeats = ifelse(grepl("cv", method), 1, 5), verbose = FALSE, returnResamp = "final", p = 0.75, index = NULL, indexOut = NULL, seeds = NULL, holdout = 0, improve = Inf, allowParallel = TRUE)
Arguments
 functions
 a list of functions for model fitting, prediction etc (see Details below)
 method
 The resampling method:
boot
,boot632
,cv
,repeatedcv
,LOOCV
,LGOCV
(for repeated training/test splits)  metric
 a twoelement string that specifies what summary metric will be used to select the optimal number of iterations from the external fitness value and which metric should guide subset selection. If specified, this vector should have names
"internal"
and"external"
. Seegafs
and/orsafs
for explanations of the difference.  maximize
 a twoelement logical: should the metrics be maximized or minimized? Like the
metric
argument, this this vector should have names"internal"
and"external"
.  number
 Either the number of folds or number of resampling iterations
 repeats
 For repeated kfold crossvalidation only: the number of complete sets of folds to compute
 verbose
 a logical for printing results
 returnResamp
 A character string indicating how much of the resampled summary metrics should be saved. Values can be ``all'' or ``none''
 p
 For leavegroup out crossvalidation: the training percentage
 index
 a list with elements for each resampling iteration. Each list element is the sample rows used for training at that iteration.
 indexOut
 a list (the same length as
index
) that dictates which sample are heldout for each resample. IfNULL
, then the unique set of samples not contained inindex
is used.  seeds
 a vector or integers that can be used to set the seed during each search. The number of seeds must be equal to the number of resamples plus one.
 holdout
 the proportion of data in [0, 1) to be heldback from
x
andy
to calculate the internal fitness values  improve
 the number of iterations without improvement before
safs
reverts back to the previous optimal subset  genParallel
 if a parallel backend is loaded and available, should
gafs
use it tp parallelize the fitness calculations within a generation within a resample?  allowParallel
 if a parallel backend is loaded and available, should the function use it?
Details
Many of these options are the same as those described for trainControl
. More extensive documentation and examples can be found on the caret website at http://topepo.github.io/caret/GA.html#syntax and http://topepo.github.io/caret/SA.html#syntax.
The functions
component contains the information about how the model should be fit and summarized. It also contains the elements needed for the GA and SA modules (e.g. crossover, etc).
The elements of functions
that are the same for GAs and SAs are:

fit
, with argumentsx
,y
,lev
,last
, and...
, is used to fit the classification or regression model 
pred
, with argumentsobject
andx
, predicts new samples 
fitness_intern
, with argumentsobject
,x
,y
,maximize
, andp
, summarizes performance for the internal estimates of fitness 
fitness_extern
, with argumentsdata
,lev
, andmodel
, summarizes performance using the externally heldout samples 
selectIter
, with argumentsx
,metric
, andmaximize
, determines the best search iteration for feature selection.
The elements of functions
specific to genetic algorithms are:

initial
, with argumentsvars
,popSize
and...
, creates an initial population. 
selection
, with argumentspopulation
,fitness
,r
,q
, and...
, conducts selection of individuals. 
crossover
, with argumentspopulation
,fitness
,parents
and...
, control genetic reproduction. 
mutation
, with argumentspopulation
,parent
and...
, adds mutations.
The elements of functions
specific to simulated annealing are:

initial
, with argumentsvars
,prob
, and...
, creates the initial subset. 
perturb
, with argumentsx
,vars
, andnumber
, makes incremental changes to the subsets. 
prob
, with argumentsold
,new
, anditeration
, computes the acceptance probabilities
The pages http://topepo.github.io/caret/GA.html and http://topepo.github.io/caret/SA.html have more details about each of these functions.
holdout
can be used to hold out samples for computing the internal fitness value. Note that this is independent of the external resampling step. Suppose 10fold CV is being used. Within a resampling iteration, holdout
can be used to sample an additional proportion of the 90% resampled data to use for estimating fitness. This may not be a good idea unless you have a very large training set and want to avoid an internal resampling procedure to estimate fitness.
The search algorithms can be parallelized in several places:
 each externally resampled GA or SA can be run independently (controlled by the
allowParallel
options)  within a GA, the fitness calculations at a particular generation can be run in parallel over the current set of individuals (see the
genParallel
)  if inner resampling is used, these can be run in parallel (controls depend on the function used. See, for example,
trainControl
)  any parallelization of the individual model fits. This is also specific to the modeling function.
It is probably best to pick one of these areas for parallelization and the first is likely to produces the largest decrease in runtime since it is the least likely to incur multiple restarting of the worker processes. Keep in mind that if multiple levels of parallelization occur, this can effect the number of workers and the amount of memory required exponentially.
Value
References
http://topepo.github.io/caret/GA.html, http://topepo.github.io/caret/SA.html
See Also
safs
, safs
, , caretGA
, rfGA
, treebagGA
, caretSA
, rfSA
, treebagSA