Learn R Programming

⚠️There's a newer version (0.7.0) of this package.Take me there.

mlr3pipelines

Package website: release | dev

Dataflow Programming for Machine Learning in R.

What is mlr3pipelines?

Watch our "WhyR 2020" Webinar Presentation on Youtube for an introduction! Find the slides here.

mlr3pipelines is a dataflow programming toolkit for machine learning in R utilising the mlr3 package. Machine learning workflows can be written as directed “Graphs” that represent data flows between preprocessing, model fitting, and ensemble learning units in an expressive and intuitive language. Using methods from the mlr3tuning package, it is even possible to simultaneously optimize parameters of multiple processing units.

In principle, mlr3pipelines is about defining singular data and model manipulation steps as “PipeOps”:

pca        = po("pca")
filter     = po("filter", filter = mlr3filters::flt("variance"), filter.frac = 0.5)
learner_po = po("learner", learner = lrn("classif.rpart"))

These pipeops can then be combined together to define machine learning pipelines. These can be wrapped in a GraphLearner that behave like any other Learner in mlr3.

graph = pca %>>% filter %>>% learner_po
glrn = GraphLearner$new(graph)

This learner can be used for resampling, benchmarking, and even tuning.

resample(tsk("iris"), glrn, rsmp("cv"))
#> <ResampleResult> of 10 iterations
#> * Task: iris
#> * Learner: pca.variance.classif.rpart
#> * Warnings: 0 in 0 iterations
#> * Errors: 0 in 0 iterations

Feature Overview

Single computational steps can be represented as so-called PipeOps, which can then be connected with directed edges in a Graph. The scope of mlr3pipelines is still growing; currently supported features are:

  • Simple data manipulation and preprocessing operations, e.g. PCA, feature filtering
  • Task subsampling for speed and outcome class imbalance handling
  • mlr3 Learner operations for prediction and stacking
  • Simultaneous path branching (data going both ways)
  • Alternative path branching (data going one specific way, controlled by hyperparameters)
  • Ensemble methods and aggregation of predictions

Documentation

The easiest way to get started is reading some of the vignettes that are shipped with the package, which can also be viewed online:

Bugs, Questions, Feedback

mlr3pipelines is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an “issue” about it on the GitHub page!

In case of problems / bugs, it is often helpful if you provide a “minimum working example” that showcases the behaviour (but don’t worry about this if the bug is obvious).

Please understand that the resources of the project are limited: response may sometimes be delayed by a few days, and some feature suggestions may be rejected if they are deemed too tangential to the vision behind the project.

Similar Projects

A predecessor to this package is the mlrCPO-package, which works with mlr 2.x. Other packages that provide, to varying degree, some preprocessing functionality or machine learning domain specific language, are the caret package and the related recipes project, and the dplyr package.

Copy Link

Version

Install

install.packages('mlr3pipelines')

Monthly Downloads

5,444

Version

0.3.0

License

LGPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Martin Binder

Last Published

September 13th, 2020

Functions in mlr3pipelines (0.3.0)

gunion

Disjoint Union of Graphs
assert_graph

Assertion for mlr3pipeline Graph
as_pipeop

Conversion to mlr3pipeline PipeOp
Graph

Graph
mlr_pipeops_boxcox

PipeOpBoxCox
is.Multiplicity

Check if an object is a Multiplicity
mlr_pipeops_branch

PipeOpBranch
PipeOpTaskPreprocSimple

PipeOpTaskPreprocSimple
Selector

Selector Functions
Multiplicity

Multiplicity
as.data.table

mlr_pipeops_colroles

PipeOpColRoles
assert_pipeop

Assertion for mlr3pipeline PipeOp
%>>%

PipeOp Composition Operator
mlr_pipeops_copy

PipeOpCopy
as_graph

Conversion to mlr3pipeline Graph
mlr_learners_avg

Optimized Weighted Average of Features for Classification and Regression
mlr_pipeops_featureunion

PipeOpFeatureUnion
mlr_graphs_targettrafo

Transform and Re-Transform the Target Variable
mlr_pipeops

Dictionary of PipeOps
mlr_learners_graph

GraphLearner
mlr_pipeops_imputemode

PipeOpImputeMode
mlr_pipeops_filter

PipeOpFilter
mlr_pipeops_imputeoor

PipeOpImputeOOR
filter_noop

Remove NO_OPs from a List
mlr_pipeops_imputehist

PipeOpImputeHist
greplicate

Create Disjoint Graph Union of Copies of a Graph
mlr_pipeops_histbin

PipeOpHistBin
mlr_pipeops_classbalancing

PipeOpClassBalancing
mlr3pipelines-package

mlr3pipelines: Preprocessing Operators and Pipelines for 'mlr3'
is_noop

Test for NO_OP
mlr_pipeops_fixfactors

PipeOpFixFactors
mlr_pipeops_chunk

PipeOpChunk
mlr_pipeops_imputesample

PipeOpImputeSample
mlr_graphs_branching

Branch Between Alternative Paths
mlr_graphs_robustify

Robustify a learner
mlr_pipeops_classifavg

PipeOpClassifAvg
mlr_pipeops_kernelpca

PipeOpKernelPCA
mlr_pipeops_imputelearner

PipeOpImputeLearner
mlr_pipeops_learner

PipeOpLearner
mlr_pipeops_learner_cv

PipeOpLearnerCV
mlr_pipeops_nop

PipeOpNOP
mlr_pipeops_randomresponse

PipeOpRandomResponse
mlr_pipeops_randomprojection

PipeOpRandomProjection
mlr_pipeops_ovrsplit

PipeOpOVRSplit
mlr_pipeops_missind

PipeOpMissInd
mlr_pipeops_classweights

PipeOpClassWeights
mlr_pipeops_encode

PipeOpEncode
NO_OP

No-Op Sentinel Used for Alternative Branching
mlr_pipeops_datefeatures

PipeOpDateFeatures
PipeOp

PipeOp
mlr_pipeops_renamecolumns

PipeOpRenameColumns
mlr_pipeops_smote

PipeOpSmote
mlr_pipeops_modelmatrix

PipeOpModelMatrix
mlr_pipeops_spatialsign

PipeOpSpatialSign
mlr_pipeops_replicate

PipeOpReplicate
mlr_pipeops_pca

PipeOpPCA
mlr_pipeops_ovrunite

PipeOpOVRUnite
mlr_pipeops_imputemedian

PipeOpImputeMedian
mlr_pipeops_imputemean

PipeOpImputeMean
mlr_pipeops_subsample

PipeOpSubsample
add_class_hierarchy_cache

Add a Class Hierarchy to the Cache
mlr_pipeops_textvectorizer

PipeOpTextVectorizer
mlr_pipeops_targetinvert

PipeOpTargetInvert
mlr_pipeops_scale

PipeOpScale
mlr_pipeops_threshold

PipeOpThreshold
mlr_pipeops_yeojohnson

PipeOpYeoJohnson
as.Multiplicity

Convert an object to a Multiplicity
ppl

Shorthand Graph Constructor
pipeline_greplicate

Create Disjoint Graph Union of Copies of a Graph
mlr_pipeops_mutate

PipeOpMutate
mlr_pipeops_nmf

PipeOpNMF
mlr_pipeops_removeconstants

PipeOpRemoveConstants
mlr_pipeops_regravg

PipeOpRegrAvg
mlr_pipeops_scalemaxabs

PipeOpScaleMaxAbs
register_autoconvert_function

Add Autoconvert Function to Conversion Register
mlr_graphs_bagging

Create a bagging learner
mlr_pipeops_collapsefactors

PipeOpCollapseFactors
mlr_pipeops_encodelmer

Impact Encoding with Random Intercept Models
mlr_pipeops_encodeimpact

Conditional Target Value Impact Encoding
mlr_graphs

Dictionary of (sub-)graphs
mlr_pipeops_colapply

PipeOpColApply
mlr_pipeops_scalerange

PipeOpScaleRange
mlr_pipeops_updatetarget

PipeOpUpdateTarget
mlr_pipeops_ica

PipeOpICA
mlr_pipeops_select

PipeOpSelect
mlr_pipeops_tunethreshold

PipeOpTuneThreshold
mlr_pipeops_unbranch

PipeOpUnbranch
mlr_pipeops_multiplicityexply

PipeOpMultiplicityExply
mlr_pipeops_proxy

PipeOpProxy
mlr_pipeops_targetmutate

PipeOpTargetMutate
mlr_pipeops_imputeconstant

PipeOpImputeConstant
mlr_pipeops_multiplicityimply

PipeOpMultiplicityImply
mlr_pipeops_quantilebin

PipeOpQuantileBin
mlr_pipeops_vtreat

PipeOpVtreat
pipeline_ovr

Create A Graph to Perform "One vs. Rest" classification.
mlr_pipeops_targettrafoscalerange

PipeOpTargetTrafoScaleRange
reset_autoconvert_register

Reset Autoconvert Register
po

Shorthand PipeOp Constructor
reset_class_hierarchy_cache

Reset the Class Hierarchy Cache
PipeOpEnsemble

PipeOpEnsemble
PipeOpImpute

PipeOpImpute
PipeOpTargetTrafo

PipeOpTargetTrafo
PipeOpTaskPreproc

PipeOpTaskPreproc