ubRacing: Racing

Description

The function implementes the Racing algorithm [2] for selecting the best technique to re-balance or remove noisy instances in unbalanced datasets [1].

Usage

ubRacing(formula, data, algo, positive=1, ncore=1, nFold=10, maxFold=10, maxExp=100,  stat.test="friedman", metric="f1", ubConf, verbose=FALSE, ...)

Arguments

formula

formula describing the model to be fitted.

data

the unbalanced dataset

algo

the classification algorithm to use with the mlr package.

positive

label of the positive (minority) class.

ncore

the number of core to use in the Race. Race is performed with parallel exectuion when ncore > 1.

nFold

number of folds in the cross-validation that provides the subset of data to the Race

maxFold

maximum number of folds to use in the Race

maxExp

maximum number of experiments to use in the Race

stat.test

statistical test to use to remove candidates which perform significantly worse than the best.

metric

metric used to asses the classification.

ubConf

configuration of the balancing techniques used in the Race.

verbose

print extra information (TRUE/FALSE)

...

additional arguments pass to train function in mlr package.

Value

Race: matrix containing accuracy results for each technique in the Race.
best: best technique selected in the Race.
avg: average of the metric used in the Race for the technique selected.
sd: standard deviation of the metric used in the Race for the technique selected.
N.test: number of experiments used in the Race.
Gain: % of computational gain with resepct to the maximum number of experiments given by the cross validation.

Details

The argument metric can take the following values: "gmean", "f1" (F-score or F-measure), "auc" (Area Under ROC curve). Argument stat.test defines the statistical test used to remove candidates during the race. It can take the following values: "friedman" (Friedman test), "t.bonferroni" (t-test with bonferroni correction), "t.holm" (t-test with holm correction), "t.none" (t-test without correction), "no" (no test, the Race continues until new subsets of data are provided by the cross validation). Argument balanceConf is a list passed to function ubBalance that is used for configuration.

References

1. Dal Pozzolo, Andrea, et al. "Racing for unbalanced methods selection." Intelligent Data Engineering and Automated Learning - IDEAL 2013. Springer Berlin Heidelberg, 2013. 24-31. 2. Birattari, Mauro, et al. "A Racing Algorithm for Configuring Metaheuristics."GECCO. Vol. 2. 2002.

Examples

Run this code

#use Racing to select the best technique for an unbalanced dataset
library(unbalanced)
data(ubIonosphere)

#configure sampling parameters
ubConf <- list(type="ubUnder", percOver=200, percUnder=200, k=2, perc=50, method="percPos", w=NULL)

#load the classification algorithm that you intend to use inside the Race
#see 'mlr' package for supported algorithms
library(randomForest)
#use only 5 trees
results <- ubRacing(Class ~., ubIonosphere, "randomForest", positive=1, ubConf=ubConf, ntree=5)

# try with 500 trees
# results <- ubRacing(Class ~., ubIonosphere, "randomForest", positive=1, ubConf=ubConf, ntree=500)
# let's try with a different algorithm
# library(e1071)
# results <- ubRacing(Class ~., ubIonosphere, "svm", positive=1, ubConf=ubConf)
# library(rpart)
# results <- ubRacing(Class ~., ubIonosphere, "rpart", positive=1, ubConf=ubConf)

Run the code above in your browser using DataLab