`sbfControl(functions = NULL, method = "boot", saveDetails = FALSE, number = ifelse(method %in% c("cv", "repeatedcv"), 10, 25), repeats = ifelse(method %in% c("cv", "repeatedcv"), 1, number), verbose = FALSE, returnResamp = "final", p = 0.75, index = NULL, indexOut = NULL, timingSamps = 0, seeds = NA, allowParallel = TRUE, multivariate = FALSE)`

functions

a list of functions for model fitting, prediction and variable filtering (see Details below)

method

The external resampling method:

`boot`

, `cv`

,
`LOOCV`

or `LGOCV`

(for repeated training/test splitsnumber

Either the number of folds or number of resampling iterations

repeats

For repeated k-fold cross-validation only: the number of complete sets of folds to compute

saveDetails

a logical to save the predictions and variable importances from the selection process

verbose

a logical to print a log for each external resampling iteration

returnResamp

A character string indicating how much of the resampled summary metrics should be saved. Values can be ``final'' or ``none''

p

For leave-group out cross-validation: the training percentage

index

a list with elements for each external resampling iteration. Each list element is the sample rows used for training at that iteration.

indexOut

a list (the same length as

`index`

) that dictates which sample are held-out for each resample. If `NULL`

, then the unique set of samples not contained in `index`

is used.timingSamps

the number of training set samples that will be used to measure the time for predicting samples (zero indicates that the prediction time should not be estimated).

seeds

an optional set of integers that will be used to set the seed at each resampling iteration. This is useful when the models are run in parallel. A value of

`NA`

will stop the seed from being set within the worker processes while a value of `NULL`

will set the seeds using a random set of integers. Alternatively, a vector of integers can be used. The vector should have `B+1`

elements where `B`

is the number of resamples. See the Examples section below. allowParallel

if a parallel backend is loaded and available, should the function use it?

multivariate

a logical; should all the columns of

`x`

be exposed to the `score`

function at once?-
a list that echos the specified arguments

Simple filter-based feature selection requires function to be specified for some operations.

The `fit`

function builds the model based on the current data set. The arguments for the function must be:

`x`

the current training set of predictor data with the appropriate subset of variables (i.e. after filtering)`y`

the current outcome data (either a numeric or factor vector)`...`

optional arguments to pass to the fit function in the call to`sbf`

The function should return a model object that can be used to generate predictions.

The `pred`

function returns a vector of predictions (numeric or factors) from the current model. The arguments are:

`object`

the model generated by the`fit`

function`x`

the current set of predictor set for the held-back samples

The `score`

function is used to return scores with names for each predictor (such as a p-value). Inputs are:

`x`

the predictors for the training samples. If`sbfControl()$multivariate`

is`TRUE`

, this will be the full predictor matrix. Otherwise it is a vector for a specific predictor.`y`

the current training outcomes

When `sbfControl()$multivariate`

is `TRUE`

, the `score`

function should return a named vector where `length(scores) == ncol(x)`

. Otherwise, the function's output should be a single value. Univariate examples are give by `anovaScores`

for classification and `gamScores`

for regression and the example below.

The `filter`

function is used to return a logical vector with names for each predictor (`TRUE`

indicates that the prediction should be retained). Inputs are:

`score`

the output of the`score`

function`x`

the predictors for the training samples`y`

the current training outcomes

The function should return a named logical vector.

Examples of these functions are included in the package: `caretSBF`

, `lmSBF`

, `rfSBF`

, `treebagSBF`

, `ldaSBF`

and `nbSBF`

.

The web page http://topepo.github.io/caret/ has more details and examples related to this function.

`sbf`

, `caretSBF`

, `lmSBF`

, `rfSBF`

, `treebagSBF`

, `ldaSBF`

and `nbSBF`

## Not run: # data(BloodBrain) # # ## Use a GAM is the filter, then fit a random forest model # set.seed(1) # RFwithGAM <- sbf(bbbDescr, logBBB, # sbfControl = sbfControl(functions = rfSBF, # verbose = FALSE, # seeds = sample.int(100000, 11), # method = "cv")) # RFwithGAM # # # ## A simple example for multivariate scoring # rfSBF2 <- rfSBF # rfSBF2$score <- function(x, y) apply(x, 2, rfSBF$score, y = y) # # set.seed(1) # RFwithGAM2 <- sbf(bbbDescr, logBBB, # sbfControl = sbfControl(functions = rfSBF2, # verbose = FALSE, # seeds = sample.int(100000, 11), # method = "cv", # multivariate = TRUE)) # RFwithGAM2 # # # ## End(Not run)