mlr3pipelines v0.3.3
Monthly downloads
Preprocessing Operators and Pipelines for 'mlr3'
Dataflow programming toolkit that enriches 'mlr3' with a diverse
set of pipelining operators ('PipeOps') that can be composed into graphs.
Operations exist for data preprocessing, model fitting, and ensemble
learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can
therefore be resampled, benchmarked, and tuned.
Readme
mlr3pipelines 
Package website: release | dev
Dataflow Programming for Machine Learning in R.
What is mlr3pipelines
?
Watch our “WhyR 2020” Webinar Presentation on Youtube for an introduction! Find the slides here.
mlr3pipelines
is a dataflow
programming toolkit
for machine learning in R utilising the
mlr3 package. Machine learning
workflows can be written as directed “Graphs” that represent data flows
between preprocessing, model fitting, and ensemble learning units in an
expressive and intuitive language. Using methods from the
mlr3tuning package, it is
even possible to simultaneously optimize parameters of multiple
processing units.
In principle, mlr3pipelines is about defining singular data and model manipulation steps as “PipeOps”:
pca = po("pca")
filter = po("filter", filter = mlr3filters::flt("variance"), filter.frac = 0.5)
learner_po = po("learner", learner = lrn("classif.rpart"))
These pipeops can then be combined together to define machine learning
pipelines. These can be wrapped in a GraphLearner
that behave like any
other Learner
in mlr3
.
graph = pca %>>% filter %>>% learner_po
glrn = GraphLearner$new(graph)
This learner can be used for resampling, benchmarking, and even tuning.
resample(tsk("iris"), glrn, rsmp("cv"))
#> <ResampleResult> of 10 iterations
#> * Task: iris
#> * Learner: pca.variance.classif.rpart
#> * Warnings: 0 in 0 iterations
#> * Errors: 0 in 0 iterations
Feature Overview
Single computational steps can be represented as so-called PipeOps, which can then be connected with directed edges in a Graph. The scope of mlr3pipelines is still growing; currently supported features are:
- Simple data manipulation and preprocessing operations, e.g. PCA, feature filtering
- Task subsampling for speed and outcome class imbalance handling
- mlr3 Learner operations for prediction and stacking
- Simultaneous path branching (data going both ways)
- Alternative path branching (data going one specific way, controlled by hyperparameters)
- Ensemble methods and aggregation of predictions
Documentation
The easiest way to get started is reading some of the vignettes that are shipped with the package, which can also be viewed online:
- Quick Introduction, with short examples to get started
Bugs, Questions, Feedback
mlr3pipelines is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an “issue” about it on the GitHub page!
In case of problems / bugs, it is often helpful if you provide a “minimum working example” that showcases the behaviour (but don’t worry about this if the bug is obvious).
Please understand that the resources of the project are limited: response may sometimes be delayed by a few days, and some feature suggestions may be rejected if they are deemed too tangential to the vision behind the project.
Similar Projects
A predecessor to this package is the mlrCPO-package, which works with mlr 2.x. Other packages that provide, to varying degree, some preprocessing functionality or machine learning domain specific language, are the caret package and the related recipes project, and the dplyr package.
Functions in mlr3pipelines
Name | Description | |
Multiplicity | Multiplicity | |
PipeOpImpute | PipeOpImpute | |
Selector | Selector Functions | |
PipeOpEnsemble | PipeOpEnsemble | |
Graph | Graph | |
PipeOp | PipeOp | |
PipeOpTargetTrafo | PipeOpTargetTrafo | |
PipeOpTaskPreprocSimple | PipeOpTaskPreprocSimple | |
NO_OP | No-Op Sentinel Used for Alternative Branching | |
PipeOpTaskPreproc | PipeOpTaskPreproc | |
mlr_graphs_bagging | Create a bagging learner | |
gunion | Disjoint Union of Graphs | |
mlr_graphs | Dictionary of (sub-)graphs | |
mlr3pipelines-package | mlr3pipelines: Preprocessing Operators and Pipelines for 'mlr3' | |
greplicate | Create Disjoint Graph Union of Copies of a Graph | |
mlr_graphs_branching | Branch Between Alternative Paths | |
is.Multiplicity | Check if an object is a Multiplicity | |
assert_graph | Assertion for mlr3pipeline Graph | |
assert_pipeop | Assertion for mlr3pipeline PipeOp | |
is_noop | Test for NO_OP | |
as_graph | Conversion to mlr3pipeline Graph | |
as_pipeop | Conversion to mlr3pipeline PipeOp | |
mlr_pipeops_chunk | PipeOpChunk | |
mlr_pipeops_branch | PipeOpBranch | |
mlr_pipeops_collapsefactors | PipeOpCollapseFactors | |
mlr_pipeops_colroles | PipeOpColRoles | |
mlr_learners_graph | GraphLearner | |
mlr_pipeops_colapply | PipeOpColApply | |
mlr_learners_avg | Optimized Weighted Average of Features for Classification and Regression | |
mlr_pipeops_classweights | PipeOpClassWeights | |
filter_noop | Remove NO_OPs from a List | |
%>>% | PipeOp Composition Operator | |
mlr_graphs_targettrafo | Transform and Re-Transform the Target Variable | |
mlr_graphs_robustify | Robustify a learner | |
mlr_pipeops_classbalancing | PipeOpClassBalancing | |
mlr_pipeops_classifavg | PipeOpClassifAvg | |
mlr_pipeops_featureunion | PipeOpFeatureUnion | |
mlr_pipeops_encodelmer | Impact Encoding with Random Intercept Models | |
mlr_pipeops_encode | PipeOpEncode | |
mlr_pipeops_copy | PipeOpCopy | |
mlr_pipeops_filter | PipeOpFilter | |
mlr_pipeops_encodeimpact | Conditional Target Value Impact Encoding | |
mlr_pipeops_datefeatures | PipeOpDateFeatures | |
mlr_pipeops_boxcox | PipeOpBoxCox | |
as.Multiplicity | Convert an object to a Multiplicity | |
mlr_pipeops_ica | PipeOpICA | |
mlr_pipeops_imputeoor | PipeOpImputeOOR | |
add_class_hierarchy_cache | Add a Class Hierarchy to the Cache | |
mlr_pipeops | Dictionary of PipeOps | |
mlr_pipeops_imputemedian | PipeOpImputeMedian | |
mlr_pipeops_histbin | PipeOpHistBin | |
mlr_pipeops_imputesample | PipeOpImputeSample | |
mlr_pipeops_kernelpca | PipeOpKernelPCA | |
mlr_pipeops_imputelearner | PipeOpImputeLearner | |
mlr_pipeops_imputemode | PipeOpImputeMode | |
mlr_pipeops_imputemean | PipeOpImputeMean | |
mlr_pipeops_imputeconstant | PipeOpImputeConstant | |
mlr_pipeops_imputehist | PipeOpImputeHist | |
mlr_pipeops_fixfactors | PipeOpFixFactors | |
mlr_pipeops_ovrunite | PipeOpOVRUnite | |
mlr_pipeops_ovrsplit | PipeOpOVRSplit | |
mlr_pipeops_multiplicityimply | PipeOpMultiplicityImply | |
mlr_pipeops_mutate | PipeOpMutate | |
mlr_pipeops_missind | PipeOpMissInd | |
mlr_pipeops_learner_cv | PipeOpLearnerCV | |
mlr_pipeops_modelmatrix | PipeOpModelMatrix | |
mlr_pipeops_multiplicityexply | PipeOpMultiplicityExply | |
mlr_pipeops_nmf | PipeOpNMF | |
mlr_pipeops_learner | PipeOpLearner | |
mlr_pipeops_nop | PipeOpNOP | |
mlr_pipeops_quantilebin | PipeOpQuantileBin | |
mlr_pipeops_replicate | PipeOpReplicate | |
mlr_pipeops_pca | PipeOpPCA | |
mlr_pipeops_scale | PipeOpScale | |
mlr_pipeops_removeconstants | PipeOpRemoveConstants | |
mlr_pipeops_renamecolumns | PipeOpRenameColumns | |
mlr_pipeops_proxy | PipeOpProxy | |
mlr_pipeops_randomprojection | PipeOpRandomProjection | |
mlr_pipeops_randomresponse | PipeOpRandomResponse | |
mlr_pipeops_regravg | PipeOpRegrAvg | |
mlr_pipeops_scalemaxabs | PipeOpScaleMaxAbs | |
mlr_pipeops_spatialsign | PipeOpSpatialSign | |
mlr_pipeops_targetinvert | PipeOpTargetInvert | |
mlr_pipeops_targetmutate | PipeOpTargetMutate | |
mlr_pipeops_subsample | PipeOpSubsample | |
mlr_pipeops_scalerange | PipeOpScaleRange | |
mlr_pipeops_select | PipeOpSelect | |
mlr_pipeops_smote | PipeOpSmote | |
mlr_pipeops_targettrafoscalerange | PipeOpTargetTrafoScaleRange | |
mlr_pipeops_textvectorizer | PipeOpTextVectorizer | |
pipeline_greplicate | Create Disjoint Graph Union of Copies of a Graph | |
pipeline_ovr | Create A Graph to Perform "One vs. Rest" classification. | |
reexports | Objects exported from other packages | |
mlr_pipeops_vtreat | PipeOpVtreat | |
register_autoconvert_function | Add Autoconvert Function to Conversion Register | |
mlr_pipeops_yeojohnson | PipeOpYeoJohnson | |
mlr_pipeops_updatetarget | PipeOpUpdateTarget | |
mlr_pipeops_unbranch | PipeOpUnbranch | |
ppl | Shorthand Graph Constructor | |
po | Shorthand PipeOp Constructor | |
mlr_pipeops_threshold | PipeOpThreshold | |
mlr_pipeops_tunethreshold | PipeOpTuneThreshold | |
reset_autoconvert_register | Reset Autoconvert Register | |
reset_class_hierarchy_cache | Reset the Class Hierarchy Cache | |
No Results! |
Last month downloads
Details
License | LGPL-3 |
URL | https://mlr3pipelines.mlr-org.com, https://github.com/mlr-org/mlr3pipelines |
BugReports | https://github.com/mlr-org/mlr3pipelines/issues |
ByteCompile | true |
Encoding | UTF-8 |
Config/testthat/edition | 3 |
Config/testthat/parallel | true |
LazyData | true |
NeedsCompilation | no |
RoxygenNote | 7.1.1 |
Collate | 'Graph.R' 'GraphLearner.R' 'mlr_pipeops.R' 'multiplicity.R' 'utils.R' 'PipeOp.R' 'PipeOpEnsemble.R' 'LearnerAvg.R' 'NO_OP.R' 'PipeOpTaskPreproc.R' 'PipeOpBoxCox.R' 'PipeOpBranch.R' 'PipeOpChunk.R' 'PipeOpClassBalancing.R' 'PipeOpClassWeights.R' 'PipeOpClassifAvg.R' 'PipeOpColApply.R' 'PipeOpColRoles.R' 'PipeOpCollapseFactors.R' 'PipeOpCopy.R' 'PipeOpDateFeatures.R' 'PipeOpEncode.R' 'PipeOpEncodeImpact.R' 'PipeOpEncodeLmer.R' 'PipeOpFeatureUnion.R' 'PipeOpFilter.R' 'PipeOpFixFactors.R' 'PipeOpHistBin.R' 'PipeOpICA.R' 'PipeOpImpute.R' 'PipeOpImputeConstant.R' 'PipeOpImputeHist.R' 'PipeOpImputeLearner.R' 'PipeOpImputeMean.R' 'PipeOpImputeMedian.R' 'PipeOpImputeMode.R' 'PipeOpImputeOOR.R' 'PipeOpImputeSample.R' 'PipeOpKernelPCA.R' 'PipeOpLearner.R' 'PipeOpLearnerCV.R' 'PipeOpMissingIndicators.R' 'PipeOpModelMatrix.R' 'PipeOpMultiplicity.R' 'PipeOpMutate.R' 'PipeOpNMF.R' 'PipeOpNOP.R' 'PipeOpOVR.R' 'PipeOpPCA.R' 'PipeOpProxy.R' 'PipeOpQuantileBin.R' 'PipeOpRandomProjection.R' 'PipeOpRandomResponse.R' 'PipeOpRegrAvg.R' 'PipeOpRemoveConstants.R' 'PipeOpRenameColumns.R' 'PipeOpScale.R' 'PipeOpScaleMaxAbs.R' 'PipeOpScaleRange.R' 'PipeOpSelect.R' 'PipeOpSmote.R' 'PipeOpSpatialSign.R' 'PipeOpSubsample.R' 'PipeOpTextVectorizer.R' 'PipeOpThreshold.R' 'PipeOpTrafo.R' 'PipeOpTuneThreshold.R' 'PipeOpUnbranch.R' 'PipeOpVtreat.R' 'PipeOpYeoJohnson.R' 'Selector.R' 'assert_graph.R' 'bibentries.R' 'greplicate.R' 'gunion.R' 'mlr_graphs.R' 'mlr_graphs_elements.R' 'operators.R' 'pipe.R' 'po.R' 'reexports.R' 'typecheck.R' 'zzz.R' |
Packaged | 2021-02-08 21:46:46 UTC; user |
Repository | CRAN |
Date/Publication | 2021-02-09 06:10:07 UTC |
imports | backports , checkmate , data.table , digest , lgr , mlr3 (>= 0.6.0) , mlr3misc (>= 0.7.0) , paradox , R6 , withr |
suggests | bbotk (>= 0.3.0) , bestNormalize , evaluate , fastICA , GenSA , ggplot2 , glmnet , igraph , kernlab , kknn , knitr , lme4 , MASS , methods , mlbench , mlr3filters (>= 0.1.1) , mlr3learners , mlr3measures , nloptr , NMF , quanteda , rmarkdown , rpart , smotefamily , stopwords , testthat , visNetwork , vtreat |
depends | R (>= 3.1.0) |
Contributors | Michel Lang, Bernd Bischl, Florian Pfisterer, Susanne Dandl, Lennart Schneider |
Include our badge in your README
[](http://www.rdocumentation.org/packages/mlr3pipelines)