balancedSplit: Split a dataset into training and testing sets, balancing a factor
Description
When performing cross-validation on a dataset, it often becomes
necessary to split the data into training and test sets that are
balanced for a factor. This function implements such a
balanced split.
Usage
balancedSplit(fac, size)
Arguments
fac
A factor that should be balanced between the two subsets.
size
A number between 0 and 1 indicating the fraction of the dataset
to be used for training.
Value
Returns a logical vector with length equal to the length of
fac. TRUE values designate samples selected for the training
set.
Details
This function randomly samples the same fraction of items from each
level of a factor to include in a training set. In most cases, this
will be a binary factor (and might even be the outcome that one wants to
predict). However, the implementation works for factors with an
arbitrary number of levels.