sbfControl
Control Object for Selection By Filtering (SBF)
Controls the execution of models with simple filters for feature selection
 Keywords
 utilities
Usage
sbfControl(functions = NULL, method = "boot", saveDetails = FALSE, number = ifelse(method %in% c("cv", "repeatedcv"), 10, 25), repeats = ifelse(method %in% c("cv", "repeatedcv"), 1, number), verbose = FALSE, returnResamp = "final", p = 0.75, index = NULL, indexOut = NULL, timingSamps = 0, seeds = NA, allowParallel = TRUE, multivariate = FALSE)
Arguments
 functions
 a list of functions for model fitting, prediction and variable filtering (see Details below)
 method
 The external resampling method:
boot
,cv
,LOOCV
orLGOCV
(for repeated training/test splits  number
 Either the number of folds or number of resampling iterations
 repeats
 For repeated kfold crossvalidation only: the number of complete sets of folds to compute
 saveDetails
 a logical to save the predictions and variable importances from the selection process
 verbose
 a logical to print a log for each external resampling iteration
 returnResamp
 A character string indicating how much of the resampled summary metrics should be saved. Values can be ``final'' or ``none''
 p
 For leavegroup out crossvalidation: the training percentage
 index
 a list with elements for each external resampling iteration. Each list element is the sample rows used for training at that iteration.
 indexOut
 a list (the same length as
index
) that dictates which sample are heldout for each resample. IfNULL
, then the unique set of samples not contained inindex
is used.  timingSamps
 the number of training set samples that will be used to measure the time for predicting samples (zero indicates that the prediction time should not be estimated).
 seeds
 an optional set of integers that will be used to set the seed at each resampling iteration. This is useful when the models are run in parallel. A value of
NA
will stop the seed from being set within the worker processes while a value ofNULL
will set the seeds using a random set of integers. Alternatively, a vector of integers can be used. The vector should haveB+1
elements whereB
is the number of resamples. See the Examples section below.  allowParallel
 if a parallel backend is loaded and available, should the function use it?
 multivariate
 a logical; should all the columns of
x
be exposed to thescore
function at once?
Details
More details on this function can be found at http://topepo.github.io/caret/featureselection.html#filter.
Simple filterbased feature selection requires function to be specified for some operations.
The fit
function builds the model based on the current data set. The arguments for the function must be:
x
the current training set of predictor data with the appropriate subset of variables (i.e. after filtering)y
the current outcome data (either a numeric or factor vector)...
optional arguments to pass to the fit function in the call tosbf
The function should return a model object that can be used to generate predictions.
The pred
function returns a vector of predictions (numeric or factors) from the current model. The arguments are:
object
the model generated by thefit
functionx
the current set of predictor set for the heldback samples
The score
function is used to return scores with names for each predictor (such as a pvalue). Inputs are:
x
the predictors for the training samples. IfsbfControl()$multivariate
isTRUE
, this will be the full predictor matrix. Otherwise it is a vector for a specific predictor.y
the current training outcomes
When sbfControl()$multivariate
is TRUE
, the score
function should return a named vector where length(scores) == ncol(x)
. Otherwise, the function's output should be a single value. Univariate examples are give by anovaScores
for classification and gamScores
for regression and the example below.
The filter
function is used to return a logical vector with names for each predictor (TRUE
indicates that the prediction should be retained). Inputs are:
score
the output of thescore
functionx
the predictors for the training samplesy
the current training outcomes
The function should return a named logical vector.
Examples of these functions are included in the package: caretSBF
, lmSBF
, rfSBF
, treebagSBF
, ldaSBF
and nbSBF
.
The web page http://topepo.github.io/caret/ has more details and examples related to this function.
Value

a list that echos the specified arguments
See Also
Examples
## Not run:
# data(BloodBrain)
#
# ## Use a GAM is the filter, then fit a random forest model
# set.seed(1)
# RFwithGAM < sbf(bbbDescr, logBBB,
# sbfControl = sbfControl(functions = rfSBF,
# verbose = FALSE,
# seeds = sample.int(100000, 11),
# method = "cv"))
# RFwithGAM
#
#
# ## A simple example for multivariate scoring
# rfSBF2 < rfSBF
# rfSBF2$score < function(x, y) apply(x, 2, rfSBF$score, y = y)
#
# set.seed(1)
# RFwithGAM2 < sbf(bbbDescr, logBBB,
# sbfControl = sbfControl(functions = rfSBF2,
# verbose = FALSE,
# seeds = sample.int(100000, 11),
# method = "cv",
# multivariate = TRUE))
# RFwithGAM2
#
#
# ## End(Not run)