Learn R Programming

emil (version 1.1-6)

resample: Resampling schemes

Description

Performance evaluation and variable tuning use resampling methods to estimate the performance of models. These are defined by resampling schemes, which are data frames where each column corresponds to a division of the data set into mutually exclusive training and test sets. Repeated hold out and cross-validation are two methods to create such schemes.

Usage

resample(method, y, ..., subset = TRUE)

resample.holdout(y = NULL, frac = 0.5, nfold = 5, balanced = is.factor(y), subset)

resample.crossval(y, nfold = 5, nrep = 5, balanced = is.factor(y), subset)

Arguments

method
The resampling method to use, e.g. "holdout" or "crossval".
y
Observations to be divided. Can either be supplied as the response of the observations themselves, or as a scalar which is interpreted as the number of objects.
...
Sent to the method specific function, e.g. "resample.holdout".
nfold
Number of folds.
balanced
Whether the sets should be balanced or not, i.e. if the class ratio over the sets should be kept constant (as far as possible).
subset
Which objects in y that are to be divided and which that are not to be part of neither set. If subset is a resampling scheme, a list of inner cross-validation schemes will be returned.
frac
Fraction of objects to hold out (0 < frac < 1).
nrep
Number of fold sets to generate.

Value

  • A data frame defining a resampling scheme. TRUE or a positive integer codes for training set and FALSE or 0 codes for test set. Positive integers > 1 code for multiple copies of an observation in the training set. NA codes for neither training nor test set and is used to exclude observations from the analysis altogether.

Details

Note that when setting up analyzes, the user should not call resample.holdout or resample.crossval directly, as resample performs additional necessary processing of the scheme.

Resampling scheme can be visualized in a human digestible form with the image function.

Functions for generating custom resampling schemes should be implemented as follows and then called by resample("myMethod", ...):

resample.myMethod <- function(y, ..., subset) [object Object],[object Object],[object Object] The function should return a list of the following elements: [object Object],[object Object]

See Also

emil, subresample, image.resample, index.fit

Examples

Run this code
resample("holdout", 50, frac=1/3)
resample("holdout", factor(runif(60) >= .5))
y <- factor(runif(60) >= .5)
cv <- resample("crossval", y)
image(cv, main="Cross-validation scheme")

Run the code above in your browser using DataLab