CrossValidate (version 2.3.4)

balancedSplit: Split a dataset into training and testing sets, balancing a factor

Description

When performing cross-validation on a dataset, it often becomes necessary to split the data into training and test sets that are balanced for a factor. This function implements such a balanced split.

Usage

balancedSplit(fac, size)

Arguments

fac

A factor that should be balanced between the two subsets.

size

A number between 0 and 1 indicating the fraction of the dataset to be used for training.

Value

Returns a logical vector with length equal to the length of fac. TRUE values designate samples selected for the training set.

Details

This function randomly samples the same fraction of items from each level of a factor to include in a training set. In most cases, this will be a binary factor (and might even be the outcome that one wants to predict). However, the implementation works for factors with an arbitrary number of levels.

See Also

CrossValidate, CrossValidate-class.

Examples

Run this code
# NOT RUN {
nFeatures <- 40
nSamples <- 2*10
dataset <- matrix(rnorm(nSamples*nFeatures), ncol=nSamples)
groups <- factor(rep(c("A", "B"), each=10))
balancedSplit(dataset, groups)
# }

Run the code above in your browser using DataLab