mlr3pipelines v0.3.0

0

Monthly downloads

0th

Percentile

Preprocessing Operators and Pipelines for 'mlr3'

Dataflow programming toolkit that enriches 'mlr3' with a diverse set of pipelining operators ('PipeOps') that can be composed into graphs. Operations exist for data preprocessing, model fitting, and ensemble learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can therefore be resampled, benchmarked, and tuned.

Readme

mlr3pipelines

Package website: release | dev

Dataflow Programming for Machine Learning in R.

tic CRAN Coverage StackOverflow

What is mlr3pipelines?

Watch our "WhyR 2020" Webinar Presentation on Youtube for an introduction! Find the slides here.

WhyR 2020
mlr3pipelines

mlr3pipelines is a dataflow programming toolkit for machine learning in R utilising the mlr3 package. Machine learning workflows can be written as directed “Graphs” that represent data flows between preprocessing, model fitting, and ensemble learning units in an expressive and intuitive language. Using methods from the mlr3tuning package, it is even possible to simultaneously optimize parameters of multiple processing units.

In principle, mlr3pipelines is about defining singular data and model manipulation steps as “PipeOps”:

pca        = po("pca")
filter     = po("filter", filter = mlr3filters::flt("variance"), filter.frac = 0.5)
learner_po = po("learner", learner = lrn("classif.rpart"))

These pipeops can then be combined together to define machine learning pipelines. These can be wrapped in a GraphLearner that behave like any other Learner in mlr3.

graph = pca %>>% filter %>>% learner_po
glrn = GraphLearner$new(graph)

This learner can be used for resampling, benchmarking, and even tuning.

resample(tsk("iris"), glrn, rsmp("cv"))
#> <ResampleResult> of 10 iterations
#> * Task: iris
#> * Learner: pca.variance.classif.rpart
#> * Warnings: 0 in 0 iterations
#> * Errors: 0 in 0 iterations

Feature Overview

Single computational steps can be represented as so-called PipeOps, which can then be connected with directed edges in a Graph. The scope of mlr3pipelines is still growing; currently supported features are:

  • Simple data manipulation and preprocessing operations, e.g. PCA, feature filtering
  • Task subsampling for speed and outcome class imbalance handling
  • mlr3 Learner operations for prediction and stacking
  • Simultaneous path branching (data going both ways)
  • Alternative path branching (data going one specific way, controlled by hyperparameters)
  • Ensemble methods and aggregation of predictions

Documentation

The easiest way to get started is reading some of the vignettes that are shipped with the package, which can also be viewed online:

Bugs, Questions, Feedback

mlr3pipelines is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an “issue” about it on the GitHub page!

In case of problems / bugs, it is often helpful if you provide a “minimum working example” that showcases the behaviour (but don’t worry about this if the bug is obvious).

Please understand that the resources of the project are limited: response may sometimes be delayed by a few days, and some feature suggestions may be rejected if they are deemed too tangential to the vision behind the project.

Similar Projects

A predecessor to this package is the mlrCPO-package, which works with mlr 2.x. Other packages that provide, to varying degree, some preprocessing functionality or machine learning domain specific language, are the caret package and the related recipes project, and the dplyr package.

Functions in mlr3pipelines

Name Description
gunion Disjoint Union of Graphs
assert_graph Assertion for mlr3pipeline Graph
as_pipeop Conversion to mlr3pipeline PipeOp
Graph Graph
mlr_pipeops_boxcox PipeOpBoxCox
is.Multiplicity Check if an object is a Multiplicity
mlr_pipeops_branch PipeOpBranch
PipeOpTaskPreprocSimple PipeOpTaskPreprocSimple
Selector Selector Functions
Multiplicity Multiplicity
as.data.table Re-export of as.data.table See data.table::as.data.table.
mlr_pipeops_colroles PipeOpColRoles
assert_pipeop Assertion for mlr3pipeline PipeOp
%>>% PipeOp Composition Operator
mlr_pipeops_copy PipeOpCopy
as_graph Conversion to mlr3pipeline Graph
mlr_learners_avg Optimized Weighted Average of Features for Classification and Regression
mlr_pipeops_featureunion PipeOpFeatureUnion
mlr_graphs_targettrafo Transform and Re-Transform the Target Variable
mlr_pipeops Dictionary of PipeOps
mlr_learners_graph GraphLearner
mlr_pipeops_imputemode PipeOpImputeMode
mlr_pipeops_filter PipeOpFilter
mlr_pipeops_imputeoor PipeOpImputeOOR
filter_noop Remove NO_OPs from a List
mlr_pipeops_imputehist PipeOpImputeHist
greplicate Create Disjoint Graph Union of Copies of a Graph
mlr_pipeops_histbin PipeOpHistBin
mlr_pipeops_classbalancing PipeOpClassBalancing
mlr3pipelines-package mlr3pipelines: Preprocessing Operators and Pipelines for 'mlr3'
is_noop Test for NO_OP
mlr_pipeops_fixfactors PipeOpFixFactors
mlr_pipeops_chunk PipeOpChunk
mlr_pipeops_imputesample PipeOpImputeSample
mlr_graphs_branching Branch Between Alternative Paths
mlr_graphs_robustify Robustify a learner
mlr_pipeops_classifavg PipeOpClassifAvg
mlr_pipeops_kernelpca PipeOpKernelPCA
mlr_pipeops_imputelearner PipeOpImputeLearner
mlr_pipeops_learner PipeOpLearner
mlr_pipeops_learner_cv PipeOpLearnerCV
mlr_pipeops_nop PipeOpNOP
mlr_pipeops_randomresponse PipeOpRandomResponse
mlr_pipeops_randomprojection PipeOpRandomProjection
mlr_pipeops_ovrsplit PipeOpOVRSplit
mlr_pipeops_missind PipeOpMissInd
mlr_pipeops_classweights PipeOpClassWeights
mlr_pipeops_encode PipeOpEncode
NO_OP No-Op Sentinel Used for Alternative Branching
mlr_pipeops_datefeatures PipeOpDateFeatures
PipeOp PipeOp
mlr_pipeops_renamecolumns PipeOpRenameColumns
mlr_pipeops_smote PipeOpSmote
mlr_pipeops_modelmatrix PipeOpModelMatrix
mlr_pipeops_spatialsign PipeOpSpatialSign
mlr_pipeops_replicate PipeOpReplicate
mlr_pipeops_pca PipeOpPCA
mlr_pipeops_ovrunite PipeOpOVRUnite
mlr_pipeops_imputemedian PipeOpImputeMedian
mlr_pipeops_imputemean PipeOpImputeMean
mlr_pipeops_subsample PipeOpSubsample
add_class_hierarchy_cache Add a Class Hierarchy to the Cache
mlr_pipeops_textvectorizer PipeOpTextVectorizer
mlr_pipeops_targetinvert PipeOpTargetInvert
mlr_pipeops_scale PipeOpScale
mlr_pipeops_threshold PipeOpThreshold
mlr_pipeops_yeojohnson PipeOpYeoJohnson
as.Multiplicity Convert an object to a Multiplicity
ppl Shorthand Graph Constructor
pipeline_greplicate Create Disjoint Graph Union of Copies of a Graph
mlr_pipeops_mutate PipeOpMutate
mlr_pipeops_nmf PipeOpNMF
mlr_pipeops_removeconstants PipeOpRemoveConstants
mlr_pipeops_regravg PipeOpRegrAvg
mlr_pipeops_scalemaxabs PipeOpScaleMaxAbs
register_autoconvert_function Add Autoconvert Function to Conversion Register
mlr_graphs_bagging Create a bagging learner
mlr_pipeops_collapsefactors PipeOpCollapseFactors
mlr_pipeops_encodelmer Impact Encoding with Random Intercept Models
mlr_pipeops_encodeimpact Conditional Target Value Impact Encoding
mlr_graphs Dictionary of (sub-)graphs
mlr_pipeops_colapply PipeOpColApply
mlr_pipeops_scalerange PipeOpScaleRange
mlr_pipeops_updatetarget PipeOpUpdateTarget
mlr_pipeops_ica PipeOpICA
mlr_pipeops_select PipeOpSelect
mlr_pipeops_tunethreshold PipeOpTuneThreshold
mlr_pipeops_unbranch PipeOpUnbranch
mlr_pipeops_multiplicityexply PipeOpMultiplicityExply
mlr_pipeops_proxy PipeOpProxy
mlr_pipeops_targetmutate PipeOpTargetMutate
mlr_pipeops_imputeconstant PipeOpImputeConstant
mlr_pipeops_multiplicityimply PipeOpMultiplicityImply
mlr_pipeops_quantilebin PipeOpQuantileBin
mlr_pipeops_vtreat PipeOpVtreat
pipeline_ovr Create A Graph to Perform "One vs. Rest" classification.
mlr_pipeops_targettrafoscalerange PipeOpTargetTrafoScaleRange
reset_autoconvert_register Reset Autoconvert Register
po Shorthand PipeOp Constructor
reset_class_hierarchy_cache Reset the Class Hierarchy Cache
PipeOpEnsemble PipeOpEnsemble
PipeOpImpute PipeOpImpute
PipeOpTargetTrafo PipeOpTargetTrafo
PipeOpTaskPreproc PipeOpTaskPreproc
No Results!

Last month downloads

Details

License LGPL-3
URL https://mlr3pipelines.mlr-org.com, https://github.com/mlr-org/mlr3pipelines
BugReports https://github.com/mlr-org/mlr3pipelines/issues
RdMacros mlr3misc
ByteCompile true
Encoding UTF-8
LazyData true
NeedsCompilation no
RoxygenNote 7.1.0
Collate 'Graph.R' 'GraphLearner.R' 'mlr_pipeops.R' 'multiplicity.R' 'utils.R' 'PipeOp.R' 'PipeOpEnsemble.R' 'LearnerAvg.R' 'NO_OP.R' 'PipeOpTaskPreproc.R' 'PipeOpBoxCox.R' 'PipeOpBranch.R' 'PipeOpChunk.R' 'PipeOpClassBalancing.R' 'PipeOpClassWeights.R' 'PipeOpClassifAvg.R' 'PipeOpColApply.R' 'PipeOpColRoles.R' 'PipeOpCollapseFactors.R' 'PipeOpCopy.R' 'PipeOpDateFeatures.R' 'PipeOpEncode.R' 'PipeOpEncodeImpact.R' 'PipeOpEncodeLmer.R' 'PipeOpFeatureUnion.R' 'PipeOpFilter.R' 'PipeOpFixFactors.R' 'PipeOpHistBin.R' 'PipeOpICA.R' 'PipeOpImpute.R' 'PipeOpImputeConstant.R' 'PipeOpImputeHist.R' 'PipeOpImputeLearner.R' 'PipeOpImputeMean.R' 'PipeOpImputeMedian.R' 'PipeOpImputeMode.R' 'PipeOpImputeOOR.R' 'PipeOpImputeSample.R' 'PipeOpKernelPCA.R' 'PipeOpLearner.R' 'PipeOpLearnerCV.R' 'PipeOpMissingIndicators.R' 'PipeOpModelMatrix.R' 'PipeOpMultiplicity.R' 'PipeOpMutate.R' 'PipeOpNMF.R' 'PipeOpNOP.R' 'PipeOpOVR.R' 'PipeOpPCA.R' 'PipeOpProxy.R' 'PipeOpQuantileBin.R' 'PipeOpRandomProjection.R' 'PipeOpRandomResponse.R' 'PipeOpRegrAvg.R' 'PipeOpRemoveConstants.R' 'PipeOpRenameColumns.R' 'PipeOpScale.R' 'PipeOpScaleMaxAbs.R' 'PipeOpScaleRange.R' 'PipeOpSelect.R' 'PipeOpSmote.R' 'PipeOpSpatialSign.R' 'PipeOpSubsample.R' 'PipeOpTextVectorizer.R' 'PipeOpThreshold.R' 'PipeOpTrafo.R' 'PipeOpTuneThreshold.R' 'PipeOpUnbranch.R' 'PipeOpVtreat.R' 'PipeOpYeoJohnson.R' 'Selector.R' 'assert_graph.R' 'greplicate.R' 'gunion.R' 'mlr_graphs.R' 'mlr_graphs_elements.R' 'operators.R' 'pipe.R' 'po.R' 'reexports.R' 'typecheck.R' 'zzz.R'
Packaged 2020-09-13 20:06:18 UTC; user
Repository CRAN
Date/Publication 2020-09-13 21:50:03 UTC

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/mlr3pipelines)](http://www.rdocumentation.org/packages/mlr3pipelines)