Partitions the data set into folds. Stratification, if requested, is done by the
best algorithm, i.e. the one with the best performance. The distribution of the
best algorithms in each fold will be approximately the same. The folds are
assembled into training and test sets by combining $n-1$ folds for training and
using the remaining fold for testing. The sets are added to the original data
set and returned.