sbfControl(functions = NULL, method = "boot", saveDetails = FALSE, number = ifelse(method %in% c("cv", "repeatedcv"), 10, 25), repeats = ifelse(method %in% c("cv", "repeatedcv"), 1, number), verbose = FALSE, returnResamp = "final", p = 0.75, index = NULL, indexOut = NULL, timingSamps = 0, seeds = NA, allowParallel = TRUE, multivariate = FALSE)
boot
, cv
,
LOOCV
or LGOCV
(for repeated training/test splitsindex
) that dictates which sample are held-out for each resample. If NULL
, then the unique set of samples not contained in index
is used.NA
will stop the seed from being set within the worker processes while a value of NULL
will set the seeds using a random set of integers. Alternatively, a vector of integers can be used. The vector should have B+1
elements where B
is the number of resamples. See the Examples section below. x
be exposed to the score
function at once?Simple filter-based feature selection requires function to be specified for some operations.
The fit
function builds the model based on the current data set. The arguments for the function must be:
x
the current training set of predictor data with
the appropriate subset of variables (i.e. after filtering)
y
the current outcome data (either a numeric or
factor vector)
...
optional arguments to pass to the fit
function in the call to sbf
The function should return a model object that can be used to generate predictions.
The pred
function returns a vector of predictions (numeric or factors) from the current model. The arguments are:
object
the model generated by the fit
function
x
the current set of predictor set for the
held-back samples
The score
function is used to return scores with names for each predictor (such as a p-value). Inputs are:
x
the predictors for the training samples. If sbfControl()$multivariate
is TRUE
, this will be the full predictor matrix. Otherwise it is a vector for a specific predictor.
y
the current training outcomes
When sbfControl()$multivariate
is TRUE
, the score
function should return a named vector where length(scores) == ncol(x)
. Otherwise, the function's output should be a single value. Univariate examples are give by anovaScores
for classification and gamScores
for regression and the example below.
The filter
function is used to return a logical vector with names for each predictor (TRUE
indicates that the prediction should be retained). Inputs are:
score
the output of the score
function
x
the predictors for the training samples
y
the current training outcomes
The function should return a named logical vector.
Examples of these functions are included in the package: caretSBF
, lmSBF
, rfSBF
, treebagSBF
, ldaSBF
and nbSBF
.
The web page http://topepo.github.io/caret/ has more details and examples related to this function.
sbf
, caretSBF
, lmSBF
, rfSBF
, treebagSBF
, ldaSBF
and nbSBF
## Not run: # data(BloodBrain) # # ## Use a GAM is the filter, then fit a random forest model # set.seed(1) # RFwithGAM <- sbf(bbbDescr, logBBB, # sbfControl = sbfControl(functions = rfSBF, # verbose = FALSE, # seeds = sample.int(100000, 11), # method = "cv")) # RFwithGAM # # # ## A simple example for multivariate scoring # rfSBF2 <- rfSBF # rfSBF2$score <- function(x, y) apply(x, 2, rfSBF$score, y = y) # # set.seed(1) # RFwithGAM2 <- sbf(bbbDescr, logBBB, # sbfControl = sbfControl(functions = rfSBF2, # verbose = FALSE, # seeds = sample.int(100000, 11), # method = "cv", # multivariate = TRUE)) # RFwithGAM2 # # # ## End(Not run)