Learn R Programming

mlr3pipelines

Package website: release | dev

Dataflow Programming for Machine Learning in R.

What is mlr3pipelines?

Watch our “WhyR 2020” Webinar Presentation on Youtube for an introduction! Find the slides here.

mlr3pipelines is a dataflow programming toolkit for machine learning in R utilising the mlr3 package. Machine learning workflows can be written as directed “Graphs” that represent data flows between preprocessing, model fitting, and ensemble learning units in an expressive and intuitive language. Using methods from the mlr3tuning package, it is even possible to simultaneously optimize parameters of multiple processing units.

In principle, mlr3pipelines is about defining singular data and model manipulation steps as “PipeOps”:

pca        = po("pca")
filter     = po("filter", filter = mlr3filters::flt("variance"), filter.frac = 0.5)
learner_po = po("learner", learner = lrn("classif.rpart"))

These pipeops can then be combined together to define machine learning pipelines. These can be wrapped in a GraphLearner that behave like any other Learner in mlr3.

graph = pca %>>% filter %>>% learner_po
glrn = GraphLearner$new(graph)

This learner can be used for resampling, benchmarking, and even tuning.

resample(tsk("iris"), glrn, rsmp("cv"))
#> <ResampleResult> with 10 resampling iterations
#>  task_id                 learner_id resampling_id iteration warnings errors
#>     iris pca.variance.classif.rpart            cv         1        0      0
#>     iris pca.variance.classif.rpart            cv         2        0      0
#>     iris pca.variance.classif.rpart            cv         3        0      0
#>     iris pca.variance.classif.rpart            cv         4        0      0
#>     iris pca.variance.classif.rpart            cv         5        0      0
#>     iris pca.variance.classif.rpart            cv         6        0      0
#>     iris pca.variance.classif.rpart            cv         7        0      0
#>     iris pca.variance.classif.rpart            cv         8        0      0
#>     iris pca.variance.classif.rpart            cv         9        0      0
#>     iris pca.variance.classif.rpart            cv        10        0      0

Feature Overview

Single computational steps can be represented as so-called PipeOps, which can then be connected with directed edges in a Graph. The scope of mlr3pipelines is still growing; currently supported features are:

  • Simple data manipulation and preprocessing operations, e.g. PCA, feature filtering
  • Task subsampling for speed and outcome class imbalance handling
  • mlr3 Learner operations for prediction and stacking
  • Simultaneous path branching (data going both ways)
  • Alternative path branching (data going one specific way, controlled by hyperparameters)
  • Ensemble methods and aggregation of predictions

Documentation

A good way to get into mlr3pipelines are the following two vignettes:

Bugs, Questions, Feedback

mlr3pipelines is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an “issue” about it on the GitHub page!

In case of problems / bugs, it is often helpful if you provide a “minimum working example” that showcases the behaviour (but don’t worry about this if the bug is obvious).

Please understand that the resources of the project are limited: response may sometimes be delayed by a few days, and some feature suggestions may be rejected if they are deemed too tangential to the vision behind the project.

Citing mlr3pipelines

If you use mlr3pipelines, please cite our JMLR article:

@Article{mlr3pipelines,
  title = {{mlr3pipelines} - Flexible Machine Learning Pipelines in R},
  author = {Martin Binder and Florian Pfisterer and Michel Lang and Lennart Schneider and Lars Kotthoff and Bernd Bischl},
  journal = {Journal of Machine Learning Research},
  year = {2021},
  volume = {22},
  number = {184},
  pages = {1-7},
  url = {https://jmlr.org/papers/v22/21-0281.html},
}

Similar Projects

A predecessor to this package is the mlrCPO-package, which works with mlr 2.x. Other packages that provide, to varying degree, some preprocessing functionality or machine learning domain specific language, are the caret package and the related recipes project, and the dplyr package.

Copy Link

Version

Install

install.packages('mlr3pipelines')

Monthly Downloads

6,259

Version

0.7.0

License

LGPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Martin Binder

Last Published

September 24th, 2024

Functions in mlr3pipelines (0.7.0)

CnfClause

Clauses in CNF Formulas
CnfSymbol

Symbols for CNF Formulas
NO_OP

No-Op Sentinel Used for Alternative Branching
PipeOpEnsemble

Ensembling Base Class
assert_graph

Assertion for mlr3pipelines Graph
as.Multiplicity

Convert an object to a Multiplicity
add_class_hierarchy_cache

Add a Class Hierarchy to the Cache
as_pipeop

Conversion to mlr3pipelines PipeOp
PipeOpImpute

Imputation Base Class
Selector

Selector Functions
as_graph

Conversion to mlr3pipelines Graph
PipeOpTaskPreprocSimple

Simple Task Preprocessing Base Class
PipeOpTaskPreproc

Task Preprocessing Base Class
PipeOpTargetTrafo

Target Transformation Base Class
filter_noop

Remove NO_OPs from a List
greplicate

Create Disjoint Graph Union of Copies of a Graph
%>>%

PipeOp Composition Operator
mlr_graphs

Dictionary of (sub-)graphs
is.Multiplicity

Check if an object is a Multiplicity
chain_graphs

Chain a Series of Graphs
gunion

Disjoint Union of Graphs
is_noop

Test for NO_OP
assert_pipeop

Assertion for mlr3pipelines PipeOp
mlr3pipelines-package

mlr3pipelines: Preprocessing Operators and Pipelines for 'mlr3'
mlr_graphs_bagging

Create a bagging learner
mlr_graphs_ovr

Create A Graph to Perform "One vs. Rest" classification.
mlr_graphs_targettrafo

Transform and Re-Transform the Target Variable
mlr_graphs_stacking

Create A Graph to Perform Stacking.
mlr_graphs_branch

Branch Between Alternative Paths
mlr_graphs_greplicate

Create Disjoint Graph Union of Copies of a Graph
mlr_graphs_convert_types

Convert Column Types
mlr_learners_avg

Optimized Weighted Average of Features for Classification and Regression
mlr_learners_graph

Encapsulate a Graph as a Learner
mlr_graphs_robustify

Robustify a learner
mlr_pipeops_adas

ADAS Balancing
mlr_pipeops_branch

Path Branching
mlr_pipeops

Dictionary of PipeOps
mlr_pipeops_classweights

Class Weights for Sample Weighting
mlr_pipeops_classbalancing

Class Balancing
mlr_pipeops_classifavg

Majority Vote Prediction
mlr_pipeops_colapply

Apply a Function to each Column of a Task
mlr_pipeops_chunk

Chunk Input into Multiple Outputs
mlr_pipeops_boxcox

Box-Cox Transformation of Numeric Features
mlr_pipeops_blsmote

BLSMOTE Balancing
mlr_pipeops_copy

Copy Input Multiple Times
mlr_pipeops_collapsefactors

Collapse Factors
mlr_pipeops_encode

Factor Encoding
mlr_pipeops_featureunion

Aggregate Features from Multiple Inputs
mlr_pipeops_filter

Feature Filtering
mlr_pipeops_encodeimpact

Conditional Target Value Impact Encoding
mlr_pipeops_colroles

Change Column Roles of a Task
mlr_pipeops_encodelmer

Impact Encoding with Random Intercept Models
mlr_pipeops_fixfactors

Fix Factor Levels
mlr_pipeops_datefeatures

Preprocess Date Features
mlr_pipeops_imputemedian

Impute Numerical Features by their Median
mlr_pipeops_imputehist

Impute Numerical Features by Histogram
mlr_pipeops_imputelearner

Impute Features by Fitting a Learner
mlr_pipeops_ica

Independent Component Analysis
mlr_pipeops_imputeconstant

Impute Features by a Constant
mlr_pipeops_imputeoor

Out of Range Imputation
mlr_pipeops_imputesample

Impute Features by Sampling
mlr_pipeops_histbin

Split Numeric Features into Equally Spaced Bins
mlr_pipeops_imputemean

Impute Numerical Features by their Mean
mlr_pipeops_imputemode

Impute Features by their Mode
mlr_pipeops_learner

Wrap a Learner into a PipeOp
mlr_pipeops_multiplicityexply

Explicate a Multiplicity
mlr_pipeops_learner_cv

Wrap a Learner into a PipeOp with Cross-validated Predictions as Features
mlr_pipeops_modelmatrix

Transform Columns by Constructing a Model Matrix
mlr_pipeops_nmf

Non-negative Matrix Factorization
mlr_pipeops_kernelpca

Kernelized Principle Component Analysis
mlr_pipeops_nop

Simply Push Input Forward
mlr_pipeops_missind

Add Missing Indicator Columns
mlr_pipeops_mutate

Add Features According to Expressions
mlr_pipeops_multiplicityimply

Implicate a Multiplicity
mlr_pipeops_removeconstants

Remove Constant Features
mlr_pipeops_quantilebin

Split Numeric Features into Quantile Bins
mlr_pipeops_proxy

Wrap another PipeOp or Graph as a Hyperparameter
mlr_pipeops_randomprojection

Project Numeric Features onto a Randomly Sampled Subspace
mlr_pipeops_regravg

Weighted Prediction Averaging
mlr_pipeops_ovrunite

Unite Binary Classification Tasks
mlr_pipeops_randomresponse

Generate a Randomized Response Prediction
mlr_pipeops_pca

Principle Component Analysis
mlr_pipeops_renamecolumns

Rename Columns
mlr_pipeops_ovrsplit

Split a Classification Task into Binary Classification Tasks
mlr_pipeops_smotenc

SMOTENC Balancing
mlr_pipeops_smote

SMOTE Balancing
mlr_pipeops_scale

Center and Scale Numeric Features
mlr_pipeops_scalemaxabs

Scale Numeric Features with Respect to their Maximum Absolute Value
mlr_pipeops_subsample

Subsampling
mlr_pipeops_spatialsign

Normalize Data Row-wise
mlr_pipeops_scalerange

Linearly Transform Numeric Features to Match Given Boundaries
mlr_pipeops_select

Remove Features Depending on a Selector
mlr_pipeops_replicate

Replicate the Input as a Multiplicity
mlr_pipeops_rowapply

Apply a Function to each Row of a Task
mlr_pipeops_targetmutate

Transform a Target by a Function
mlr_pipeops_targettrafoscalerange

Linearly Transform a Numeric Target to Match Given Boundaries
mlr_pipeops_vtreat

Interface to the vtreat Package
mlr_pipeops_yeojohnson

Yeo-Johnson Transformation of Numeric Features
mlr_pipeops_unbranch

Unbranch Different Paths
mlr_pipeops_targetinvert

Invert Target Transformations
mlr_pipeops_updatetarget

Transform a Target without an Explicit Inversion
mlr_pipeops_threshold

Change the Threshold of a Classification Prediction
mlr_pipeops_tunethreshold

Tune the Threshold of a Classification Prediction
mlr_pipeops_textvectorizer

Bag-of-word Representation of Character Features
reset_autoconvert_register

Reset Autoconvert Register
set_validate.GraphLearner

Configure Validation for a GraphLearner
reset_class_hierarchy_cache

Reset the Class Hierarchy Cache
po

Shorthand PipeOp Constructor
ppl

Shorthand Graph Constructor
reexports

Objects exported from other packages
register_autoconvert_function

Add Autoconvert Function to Conversion Register
CnfUniverse

Symbol Table for CNF Formulas
Graph

Graph Base Class
Multiplicity

Multiplicity
PipeOp

PipeOp Base Class
CnfFormula

CNF Formulas
CnfAtom

Atoms for CNF Formulas