This function powers grouped resampling by splitting the data based upon a grouping variable and returning the assessment set indices for each split.
make_groups(
data,
group,
v,
balance = c("groups", "observations", "prop"),
strata = NULL,
...
)A data frame.
A variable in data (single character or name) used for
grouping observations with the same value to either the analysis or
assessment set within a fold.
The number of partitions of the data set.
If v is less than the number of unique groups, how should
groups be combined into folds? Should be one of
"groups", "observations", "prop".
A variable in data (single character or name) used to conduct
stratified sampling. When not NULL, each resample is created within the
stratification variable. Numeric strata are binned into quartiles.
Arguments passed to balance functions.
Not all balance options are accepted -- or make sense -- for all resampling
functions. For instance, balance = "prop" assigns groups to folds at
random, meaning that any given observation is not guaranteed to be in one
(and only one) assessment set. That means balance = "prop" can't
be used with group_vfold_cv(), and so isn't an option available for that
function.
Similarly, group_mc_cv() and its derivatives don't assign data to one (and
only one) assessment set, but rather allow each observation to be in an
assessment set zero-or-more times. As a result, those functions don't have
a balance argument, and under the hood always specify balance = "prop"
when they call make_groups().