cv_partition: Randomly partitions data for cross-validation.

Description

For a vector of training labels, we return a list of cross-validation folds, where each fold has the indices of the observations to leave out in the fold. In terms of classification error rate estimation, one can think of a fold as a the observations to hold out as a test sample set. Either the hold_out size or the number of folds, num_folds, can be specified. The number of folds defaults to 10, but if the hold_out size is specified, then num_folds is ignored.

Usage

cv_partition(y, num_folds = 10, hold_out = NULL, seed = NULL)

Arguments

a vector of class labels

num_folds

the number of cross-validation folds. Ignored if hold_out is not NULL. See Details.

hold_out

the hold-out size for cross-validation. See Details.

seed

optional random number seed for splitting the data for cross-validation

Value

list the indices of the training and test observations for each fold.

Details

We partition the vector y based on its length, which we treat as the sample size, 'n'. If an object other than a vector is used in y, its length can yield unexpected results. For example, the output of length(diag(3)) is 9.

Examples

Run this code

# NOT RUN {
# The following three calls to `cv_partition` yield the same partitions.
set.seed(42)
cv_partition(iris$Species)
cv_partition(iris$Species, num_folds = 10, seed = 42)
cv_partition(iris$Species, hold_out = 15, seed = 42)
# }

Run the code above in your browser using DataLab