balancedSplit: Split a dataset into training and testing sets, balancing a factor
Description
When performing cross-validation on a dataset, it often becomes
necessary to split the data into training and test sets that are
balanced for a factor. This function implements such a
balanced split.
Usage
balancedSplit(fac, size)
Value
Returns a logical vector with length equal to the length of
fac. TRUE values designate samples selected for the training
set.
Arguments
fac
A factor that should be balanced between the two subsets.
size
A number between 0 and 1 indicating the fraction of the dataset
to be used for training.
Author
Kevin R. Coombes <krc@silicovore.com>
Details
This function randomly samples the same fraction of items from each
level of a factor to include in a training set. In most cases, this
will be a binary factor (and might even be the outcome that one wants to
predict). However, the implementation works for factors with an
arbitrary number of levels.