# sbfControl

##### Control Object for Selection By Filtering (SBF)

Controls the execution of models with simple filters for feature selection

- Keywords
- utilities

##### Usage

```
sbfControl(functions = NULL, method = "boot", saveDetails = FALSE,
number = ifelse(method %in% c("cv", "repeatedcv"), 10, 25),
repeats = ifelse(method %in% c("cv", "repeatedcv"), 1, number),
verbose = FALSE, returnResamp = "final", p = 0.75, index = NULL,
indexOut = NULL, timingSamps = 0, seeds = NA, allowParallel = TRUE,
multivariate = FALSE)
```

##### Arguments

- functions
a list of functions for model fitting, prediction and variable filtering (see Details below)

- method
The external resampling method:

`boot`

,`cv`

,`LOOCV`

or`LGOCV`

(for repeated training/test splits- saveDetails
a logical to save the predictions and variable importances from the selection process

- number
Either the number of folds or number of resampling iterations

- repeats
For repeated k-fold cross-validation only: the number of complete sets of folds to compute

- verbose
a logical to print a log for each external resampling iteration

- returnResamp
A character string indicating how much of the resampled summary metrics should be saved. Values can be ``final'' or ``none''

- p
For leave-group out cross-validation: the training percentage

- index
a list with elements for each external resampling iteration. Each list element is the sample rows used for training at that iteration.

- indexOut
a list (the same length as

`index`

) that dictates which sample are held-out for each resample. If`NULL`

, then the unique set of samples not contained in`index`

is used.- timingSamps
the number of training set samples that will be used to measure the time for predicting samples (zero indicates that the prediction time should not be estimated).

- seeds
an optional set of integers that will be used to set the seed at each resampling iteration. This is useful when the models are run in parallel. A value of

`NA`

will stop the seed from being set within the worker processes while a value of`NULL`

will set the seeds using a random set of integers. Alternatively, a vector of integers can be used. The vector should have`B+1`

elements where`B`

is the number of resamples. See the Examples section below.- allowParallel
if a parallel backend is loaded and available, should the function use it?

- multivariate
a logical; should all the columns of

`x`

be exposed to the`score`

function at once?

##### Details

More details on this function can be found at http://topepo.github.io/caret/feature-selection-using-univariate-filters.html.

Simple filter-based feature selection requires function to be specified for some operations.

The `fit`

function builds the model based on the current data set. The
arguments for the function must be:

`x`

the current training set of predictor data with the appropriate subset of variables (i.e. after filtering)`y`

the current outcome data (either a numeric or factor vector)`...`

optional arguments to pass to the fit function in the call to`sbf`

The function should return a model object that can be used to generate predictions.

The `pred`

function returns a vector of predictions (numeric or
factors) from the current model. The arguments are:

`object`

the model generated by the`fit`

function`x`

the current set of predictor set for the held-back samples

The `score`

function is used to return scores with names for each
predictor (such as a p-value). Inputs are:

`x`

the predictors for the training samples. If`sbfControl()$multivariate`

is`TRUE`

, this will be the full predictor matrix. Otherwise it is a vector for a specific predictor.`y`

the current training outcomes

When `sbfControl()$multivariate`

is `TRUE`

, the
`score`

function should return a named vector where
`length(scores) == ncol(x)`

. Otherwise, the function's output should be
a single value. Univariate examples are give by `anovaScores`

for classification and `gamScores`

for regression and the
example below.

The `filter`

function is used to return a logical vector with names for
each predictor (`TRUE`

indicates that the prediction should be
retained). Inputs are:

`score`

the output of the`score`

function`x`

the predictors for the training samples`y`

the current training outcomes

The function should return a named logical vector.

Examples of these functions are included in the package:
`caretSBF`

, `lmSBF`

, `rfSBF`

,
`treebagSBF`

, `ldaSBF`

and `nbSBF`

.

The web page http://topepo.github.io/caret/ has more details and examples related to this function.

##### Value

a list that echos the specified arguments

##### See Also

##### Examples

```
# NOT RUN {
# }
# NOT RUN {
data(BloodBrain)
## Use a GAM is the filter, then fit a random forest model
set.seed(1)
RFwithGAM <- sbf(bbbDescr, logBBB,
sbfControl = sbfControl(functions = rfSBF,
verbose = FALSE,
seeds = sample.int(100000, 11),
method = "cv"))
RFwithGAM
## A simple example for multivariate scoring
rfSBF2 <- rfSBF
rfSBF2$score <- function(x, y) apply(x, 2, rfSBF$score, y = y)
set.seed(1)
RFwithGAM2 <- sbf(bbbDescr, logBBB,
sbfControl = sbfControl(functions = rfSBF2,
verbose = FALSE,
seeds = sample.int(100000, 11),
method = "cv",
multivariate = TRUE))
RFwithGAM2
# }
```

*Documentation reproduced from package caret, version 6.0-80, License: GPL (>= 2)*