The package provides functions to read performance data, build performance
models that enable selection of algorithms (using external machine learning
functions) and evaluate those models.
Data is input using input
and can then be used to learn
performance models. There are currently four main ways to create models.
Classification (classify
) creates a single machine learning model
that predicts the algorithm to use as a label. Classification of pairs of
algorithms (classifyPairs
) creates a classification model for each pair
of algorithms that predicts which one is better and aggregates these predictions
to determine the best overall algorithm. Clustering (cluster
) clusters
the problems to solve and assigns the best algorithm to each cluster. Regression
(regression
) trains separate models for all available algorithms,
predicts the performance on a problem independently and chooses the algorithm
with the best predicted performance. Regression of pairs of algorithms
(regressionPairs
) is similar to classifyPairs
, but predicts the
performance difference between each pair of algorithms.
Various functions to split the data into training and test set(s) and to
evaluate the performance of the learned models are provided.
LLAMA uses the mlr package to access the implementation of machine learning
algorithms in R.
The model building functions are using the parallelMap
package
(https://github.com/berndbischl/parallelMap) to parallelize across the
data partitions (e.g. cross-validation folds) with level "llama.fold" and
"llama.tune" for tuning. By default, everything is run sequentially. By loading
a suitable backend (e.g. through parallelStartSocket(2)
for
parallelization across 2 CPUs using sockets), the model building will be
parallelized automatically and transparently. Note that this does not
mean that all machine learning algorithms used for building models can be
parallelized safely. For functions that are not thread safe, use
parallelStartSocket
to run in separate processes.