mlr3filters
mlr3filters adds filters, feature selection methods and embedded feature selection methods of algorithms to mlr3.
Installation
remotes::install_github("mlr-org/mlr3filters")
Filters
Filter Example
library("mlr3")
library("mlr3filters")
task = tsk("pima")
filter = flt("auc")
as.data.table(filter$calculate(task))
## feature score
## 1: glucose 0.28961567
## 2: age 0.18694030
## 3: mass 0.17702985
## 4: pregnant 0.11951493
## 5: pressure 0.10810075
## 6: pedigree 0.10620149
## 7: triceps 0.10125373
## 8: insulin 0.07975746
Implemented Filters
Name | Task Type | Feature Types | Package |
---|---|---|---|
anova | Classif | Integer, Numeric | stats |
auc | Classif | Integer, Numeric | Metrics |
carscore | Regr | Numeric | care |
cmim | Classif & Regr | Integer, Numeric, Factor, Ordered | praznik |
correlation | Regr | Integer, Numeric | stats |
disr | Classif | Integer, Numeric, Factor, Ordered | praznik |
importance | Universal | Logical, Integer, Numeric, Character, Factor, Ordered | rpart |
information_gain | Classif & Regr | Integer, Numeric, Factor, Ordered | FSelectorRcpp |
jmi | Classif | Integer, Numeric, Factor, Ordered | praznik |
jmim | Classif | Integer, Numeric, Factor, Ordered | praznik |
kruskal_test | Classif | Integer, Numeric | stats |
mim | Classif | Integer, Numeric, Factor, Ordered | praznik |
mrmr | Classif & Regr | Numeric, Factor, Integer, Character, Logical | praznik |
njmim | Classif | Integer, Numeric, Factor, Ordered | praznik |
performance | Universal | Logical, Integer, Numeric, Character, Factor, Ordered | rpart |
variance | Classif & Regr | Integer, Numeric | stats |
Variable Importance Filters
The following learners allow the extraction of variable importance and
therefore are supported by
FilterImportance
:
## [1] "classif.featureless" "classif.ranger" "classif.rpart"
## [4] "classif.xgboost" "regr.featureless" "regr.ranger"
## [7] "regr.rpart" "regr.xgboost"
If your learner is not listed here but capable of extracting variable importance from the fitted model, the reason is most likely that it is not yet integrated in mlr3learners or mlr3extralearners. Please open an issue so we can add your package.
Some learners need to have their variable importance measure “activated” during learner creation. For example, to use the “impurity” measure of Random Forest via the ranger package:
task = tsk("iris")
lrn = lrn("classif.ranger")
lrn$param_set$values = list(importance = "impurity")
filter = flt("importance", learner = lrn)
filter$calculate(task)
head(as.data.table(filter), 3)
## feature score
## 1: Petal.Width 45.865850
## 2: Petal.Length 41.033283
## 3: Sepal.Length 9.929504
Performance Filter
FilterPerformance
is a univariate filter method which calls
resample()
with every predictor variable in the dataset and ranks the
final outcome using the supplied measure. Any learner can be passed to
this filter with classif.rpart
being the default. Of course, also
regression learners can be passed if the task is of type “regr”.