gafs
and safs
functionsgafsControl(functions = NULL,
method = "repeatedcv",
metric = NULL,
maximize = NULL,
number = ifelse(grepl("cv", method), 10, 25),
repeats = ifelse(grepl("cv", method), 1, 5),
verbose = FALSE,
returnResamp = "final",
p = 0.75,
index = NULL,
indexOut = NULL,
seeds = NULL,
holdout = 0,
genParallel = FALSE,
allowParallel = TRUE)safsControl(functions = NULL,
method = "repeatedcv",
metric = NULL,
maximize = NULL,
number = ifelse(grepl("cv", method), 10, 25),
repeats = ifelse(grepl("cv", method), 1, 5),
verbose = FALSE,
returnResamp = "final",
p = 0.75,
index = NULL,
indexOut = NULL,
seeds = NULL,
holdout = 0,
improve = Inf,
allowParallel = TRUE)
boot
, boot632
, cv
, repeatedcv
,
LOOCV
, LGOCV
(for repeated training/test splits)"internal"
metric
argument, this this vector should have names "internal"
and "external"
.index
) that dictates which sample are held-out for each resample. If NULL
, then the unique set of samples not contained in index
is used.x
and y
to calculate the internal fitness valuessafs
reverts back to the previous optimal subsetgafs
use it tp parallelize the fitness calculations within a generation within a resample?trainControl
. More extensive documentation and examples can be found on the The functions
component contains the information about how the model should be fit and summarized. It also contains the elements needed for the GA and SA modules (e.g. cross-over, etc).
The elements of functions
that are the same for GAs and SAs are:
fit
, with argumentsx
,y
,lev
,last
, and...
, is used to fit the classification or regression modelpred
, with argumentsobject
andx
, predicts new samplesfitness_intern
, with argumentsobject
,x
,y
,maximize
, andp
, summarizes performance for the internal estimates of fitnessfitness_extern
, with argumentsdata
,lev
, andmodel
, summarizes performance using the externally held-out samplesselectIter
, with argumentsx
,metric
, andmaximize
, determines the best search iteration for feature selection. The elements of functions
specific to genetic algorithms are:
initial
, with argumentsvars
,popSize
and...
, creates an initial population.selection
, with argumentspopulation
,fitness
,r
,q
, and...
, conducts selection of individuals.crossover
, with argumentspopulation
,fitness
,parents
and...
, control genetic reproduction.mutation
, with argumentspopulation
,parent
and...
, adds mutations. The elements of functions
specific to simulated annealing are:
initial
, with argumentsvars
,prob
, and...
, creates the initial subset.perturb
, with argumentsx
,vars
, andnumber
, makes incremental changes to the subsets.prob
, with argumentsold
,new
, anditeration
, computes the acceptance probabilitiesThe pages
holdout
can be used to hold out samples for computing the internal fitness value. Note that this is independent of the external resampling step. Suppose 10-fold CV is being used. Within a resampling iteration, holdout
can be used to sample an additional proportion of the 90% resampled data to use for estimating fitness. This may not be a good idea unless you have a very large training set and want to avoid an internal resampling procedure to estimate fitness.
The search algorithms can be parallelized in several places:
allowParallel
options)genParallel
)trainControl
)It is probably best to pick one of these areas for parallelization and the first is likely to produces the largest decrease in run-time since it is the least likely to incur multiple re-starting of the worker processes. Keep in mind that if multiple levels of parallelization occur, this can effect the number of workers and the amount of memory required exponentially.
safs
, safs
, , caretGA
, rfGA
, treebagGA
, caretSA
, rfSA
, treebagSA