Learn R Programming

sortinghat (version 0.1)

cv_partition: Partitions data for cross-validation.

Description

For a vector of training labels, we return a list of cross-validation folds, where each fold has the indices of the observations to leave out in the fold. In terms of classification error rate estimation, one can think of a fold as a the observations to hold out as a test sample set.

Usage

cv_partition(y, num_folds = 10, hold_out = NULL,
    seed = NULL)

Arguments

y
a vector of class labels to partition
num_folds
the number of cross-validation folds. Ignored if hold_out is not NULL. See Details.
hold_out
the hold-out size for cross-validation. See Details.
seed
optional random number seed for splitting the data for cross-validation

Value

  • list the indices of the training and test observations for each fold.

Details

Either the hold_out size or num_folds can be specified. The number of folds defaults to 10, but if the hold_out size is specified, then num_folds is ignored.

We partition the vector y based on its length, which we treat as the sample size, n. If an object other than a vector is used in y, its length can yield unexpected results. For example, the output of length(diag(3)) is 9.

Examples

Run this code
library(MASS)
# The following three calls to \\code{cv_partition} yield the same partitions.
set.seed(42)
cv_partition(iris$Species)
cv_partition(iris$Species, num_folds = 10, seed = 42)
cv_partition(iris$Species, hold_out = 15, seed = 42)

Run the code above in your browser using DataLab