Performance evaluation and parameter tuning use resampling methods to estimate the performance of models. These are defined by resampling schemes, which are data frames where each column corresponds to a division of the data set into mutually exclusive training and test sets. Repeated hold out and cross-validation are two methods to create such schemes.
resample(method, y, ..., subset = TRUE)resample_holdout(y, test_fraction = 0.5, nfold = 5,
balanced = is.factor(y), subset)
resample_crossvalidation(y, nfold = 5, nrepeat = 5,
balanced = is.factor(y), subset)
resample_bootstrap(y, nfold = 10, fit_fraction = if (replace) 1 else 0.632,
replace = TRUE, balanced = is.factor(y), subset)
The resampling method to use, e.g. "holdout"
or
"crossvalidation"
.
Observations to be divided.
Sent to the method specific function, e.g.
"resample_holdout"
.
Which objects in y
that are to be divided and which
that are not to be part of neither set.
If subset
is a resampling scheme, a list of inner
cross-validation schemes will be returned.
Fraction of objects to hold out (0 < test_fraction < 1).
Number of folds.
Whether the sets should be balanced or not, i.e. if the class ratio over the sets should be kept constant (as far as possible).
Number of fold sets to generate.
The size of the training set relative to the entire data set.
Whether to sample with replacement.
A data frame defining a resampling scheme. TRUE
or a positive integer
codes for training set and FALSE
or 0
codes for test set.
Positive integers > 1 code for multiple copies of an observation in the
training set. NA
codes for neither training nor test set and is
used to exclude observations from the analysis altogether.
Note that when setting up analyzes, the user should not call
resample_holdout
or resample_crossvalidation
directly, as
resample
performs additional necessary processing of the scheme.
Resampling scheme can be visualized in a human digestible form with the
image
function.
Functions for generating custom resampling schemes should be implemented as
follows and then called by resample("myMethod", ...)
:
resample_myMethod <- function(y, ..., subset)
y
Response vector.
...
Method specific attributes.
subset
Indexes of observations to be excluded for the resampling.
The function should return a list of the following elements:
folds
A data frame with the folds of the scheme that conforms to the description in the 'Value' section below.
parameter
A list with the parameters necessary to generate
such a resampling scheme. These are needed when creating subschemes
needed for parameter tuning, see subresample
.
# NOT RUN {
resample("holdout", 1:50, test_fraction=1/3)
resample("holdout", factor(runif(60) >= .5))
y <- factor(runif(60) >= .5)
cv <- resample("crossvalidation", y)
image(cv, main="Cross-validation scheme")
# }
Run the code above in your browser using DataLab