mlr3resampling (version 2025.11.19)
Resampling Algorithms for 'mlr3' Framework
Description
A supervised learning algorithm inputs a train set,
and outputs a prediction function, which can be used on a test set.
If each data point belongs to a subset
(such as geographic region, year, etc), then
how do we know if subsets are similar enough so that
we can get accurate predictions on one subset,
after training on Other subsets?
And how do we know if training on All subsets would improve
prediction accuracy, relative to training on the Same subset?
SOAK, Same/Other/All K-fold cross-validation,
can be used to answer these questions, by fixing a test subset,
training models on Same/Other/All subsets, and then
comparing test error rates (Same versus Other and Same versus All).
Also provides code for estimating how many train samples
are required to get accurate predictions on a test set.