Parse experimental design
extract_experimental_setup(
experimental_design,
file_dir,
message_indent = 0L,
verbose = TRUE
)data.table with subsampler information at different levels of the experimental design.
(required) Defines what the experiment looks
like, e.g. cv(bt(fs,20)+mb,3,2)+ev for 2 times repeated 3-fold
cross-validation with nested feature selection on 20 bootstraps and
model-building, and external validation. The basic workflow components are:
fs: (required) feature selection step.
mb: (required) model building step.
ev: (optional) external validation. Note that internal validation due
to subsampling will always be conducted if the subsampling methods create
any validation data sets.
The different components are linked using +.
Different subsampling methods can be used in conjunction with the basic workflow components:
bs(x,n): (stratified) .632 bootstrap, with n the number of
bootstraps. In contrast to bt, feature pre-processing parameters and
hyperparameter optimisation are conducted on individual bootstraps.
bt(x,n): (stratified) .632 bootstrap, with n the number of
bootstraps. Unlike bs and other subsampling methods, no separate
pre-processing parameters or optimised hyperparameters will be determined
for each bootstrap.
cv(x,n,p): (stratified) n-fold cross-validation, repeated p times.
Pre-processing parameters are determined for each iteration.
lv(x): leave-one-out-cross-validation. Pre-processing parameters are
determined for each iteration.
ip(x): imbalance partitioning for addressing class imbalances on the
data set. Pre-processing parameters are determined for each partition. The
number of partitions generated depends on the imbalance correction method
(see the imbalance_correction_method parameter). Imbalance partitioning
does not generate validation sets.
As shown in the example above, sampling algorithms can be nested.
The simplest valid experimental design is fs+mb, which corresponds to a
TRIPOD type 1a analysis. Type 1b analyses are only possible using
bootstraps, e.g. bt(fs+mb,100). Type 2a analyses can be conducted using
cross-validation, e.g. cv(bt(fs,100)+mb,10,1). Depending on the origin of
the external validation data, designs such as fs+mb+ev or
cv(bt(fs,100)+mb,10,1)+ev constitute type 2b or type 3 analyses. Type 4
analyses can be done by obtaining one or more familiarModel objects from
others and applying them to your own data set.
Alternatively, the experimental_design parameter may be used to provide a
path to a file containing iterations, which is named ####_iterations.RDS
by convention. This path can be relative to the directory of the current
experiment (experiment_dir), or an absolute path. The absolute path may
thus also point to a file from a different experiment.
Spacing inserted before messages.
Sets verbosity.
This function converts the experimental_design string