mlr (version 2.13)

batchmark: Run machine learning benchmarks as distributed experiments.

Description

This function is a very parallel version of [benchmark] using batchtools. Experiments are created in the provided registry for each combination of learners, tasks and resamplings. The experiments are then stored in a registry and the runs can be started via [batchtools::submitJobs]. A job is one train/test split of the outer resampling. In case of nested resampling (e.g. with [makeTuneWrapper]), each job is a full run of inner resampling, which can be parallelized in a second step with ParallelMap. For details on the usage and support backends have a look at the batchtools tutorial page: <https://github.com/mllg/batchtools>.

The general workflow with `batchmark` looks like this:

  1. Create an ExperimentRegistry using [batchtools::makeExperimentRegistry].

  2. Call `batchmark(...)` which defines jobs for all learners and tasks in an [base::expand.grid] fashion.

  3. Submit jobs using [batchtools::submitJobs].

  4. Babysit the computation, wait for all jobs to finish using [batchtools::waitForJobs].

  5. Call `reduceBatchmarkResult()` to reduce results into a [BenchmarkResult].

If you want to use this with OpenML datasets you can generate tasks from a vector of dataset IDs easily with `tasks = lapply(data.ids, function(x) convertOMLDataSetToMlr(getOMLDataSet(x)))`.

Usage

batchmark(learners, tasks, resamplings, measures, models = TRUE,
  reg = batchtools::getDefaultRegistry())

Arguments

learners

(list of Learner | character) Learning algorithms which should be compared, can also be a single learner. If you pass strings the learners will be created via makeLearner.

tasks

list of Task Tasks that learners should be run on.

resamplings

[(list of) [ResampleDesc]) Resampling strategy for each tasks. If only one is provided, it will be replicated to match the number of tasks. If missing, a 10-fold cross validation is used.

measures

(list of Measure) Performance measures for all tasks. If missing, the default measure of the first task is used.

models

(logical(1)) Should all fitted models be stored in the ResampleResult? Default is TRUE.

reg

([batchtools::Registry]) Registry, created by [batchtools::makeExperimentRegistry]. If not explicitly passed, uses the last created registry.

Value

([data.table]). Generated job ids are stored in the column “job.id”.

See Also

Other benchmark: BenchmarkResult, benchmark, convertBMRToRankMatrix, friedmanPostHocTestBMR, friedmanTestBMR, generateCritDifferencesData, getBMRAggrPerformances, getBMRFeatSelResults, getBMRFilteredFeatures, getBMRLearnerIds, getBMRLearnerShortNames, getBMRLearners, getBMRMeasureIds, getBMRMeasures, getBMRModels, getBMRPerformances, getBMRPredictions, getBMRTaskDescs, getBMRTaskIds, getBMRTuneResults, plotBMRBoxplots, plotBMRRanksAsBarChart, plotBMRSummary, plotCritDifferences, reduceBatchmarkResults