A Backend for a 'nextflow' Pipeline that Performs
Machine-Learning-Based Modeling of Biomedical Data
Description
Provides functionality to perform machine-learning-based modeling in a computation pipeline.
Its functions contain the basic steps of machine-learning-based knowledge discovery workflows,
including model training and optimization, model evaluation, and model testing.
To perform these tasks, the package builds heavily on existing machine-learning packages,
such as 'caret' and associated packages.
The package can train multiple models, optimize model hyperparameters by performing a grid search
or a random search, and evaluates model performance by different metrics.
Models can be validated either on a test data set, or in case of a small sample size
by k-fold cross validation or repeated bootstrapping.
It also allows for 0-Hypotheses generation by performing permutation experiments.
Additionally, it offers methods of model interpretation and item categorization
to identify the most informative features from a high dimensional data space.
The functions of this package can easily be integrated into computation pipelines
(e.g. 'nextflow' ) and hereby improve scalability,
standardization, and re-producibility in the context of machine-learning.