unbalanced (version 2.0)

ubRacing: Racing

Description

The function implementes the Racing algorithm [2] for selecting the best technique to re-balance or remove noisy instances in unbalanced datasets [1].

Usage

ubRacing(formula, data, algo, positive=1, ncore=1, nFold=10, maxFold=10, maxExp=100, stat.test="friedman", metric="f1", ubConf, verbose=FALSE, ...)

Arguments

formula
formula describing the model to be fitted.
data
the unbalanced dataset
algo
the classification algorithm to use with the mlr package.
positive
label of the positive (minority) class.
ncore
the number of core to use in the Race. Race is performed with parallel exectuion when ncore > 1.
nFold
number of folds in the cross-validation that provides the subset of data to the Race
maxFold
maximum number of folds to use in the Race
maxExp
maximum number of experiments to use in the Race
stat.test
statistical test to use to remove candidates which perform significantly worse than the best.
metric
metric used to asses the classification.
ubConf
configuration of the balancing techniques used in the Race.
verbose
print extra information (TRUE/FALSE)
...
additional arguments pass to train function in mlr package.

Value

The function returns a list:
Race
matrix containing accuracy results for each technique in the Race.
best
best technique selected in the Race.
avg
average of the metric used in the Race for the technique selected.
sd
standard deviation of the metric used in the Race for the technique selected.
N.test
number of experiments used in the Race.
Gain
% of computational gain with resepct to the maximum number of experiments given by the cross validation.

Details

The argument metric can take the following values: "gmean", "f1" (F-score or F-measure), "auc" (Area Under ROC curve). Argument stat.test defines the statistical test used to remove candidates during the race. It can take the following values: "friedman" (Friedman test), "t.bonferroni" (t-test with bonferroni correction), "t.holm" (t-test with holm correction), "t.none" (t-test without correction), "no" (no test, the Race continues until new subsets of data are provided by the cross validation). Argument balanceConf is a list passed to function ubBalance that is used for configuration.

References

1. Dal Pozzolo, Andrea, et al. "Racing for unbalanced methods selection." Intelligent Data Engineering and Automated Learning - IDEAL 2013. Springer Berlin Heidelberg, 2013. 24-31. 2. Birattari, Mauro, et al. "A Racing Algorithm for Configuring Metaheuristics."GECCO. Vol. 2. 2002.

See Also

ubBalance, ubOver, ubUnder, ubSMOTE, ubOSS, ubCNN, ubENN, ubNCL, ubTomek

Examples

Run this code
#use Racing to select the best technique for an unbalanced dataset
library(unbalanced)
data(ubIonosphere)

#configure sampling parameters
ubConf <- list(type="ubUnder", percOver=200, percUnder=200, k=2, perc=50, method="percPos", w=NULL)

#load the classification algorithm that you intend to use inside the Race
#see 'mlr' package for supported algorithms
library(randomForest)
#use only 5 trees
results <- ubRacing(Class ~., ubIonosphere, "randomForest", positive=1, ubConf=ubConf, ntree=5)

# try with 500 trees
# results <- ubRacing(Class ~., ubIonosphere, "randomForest", positive=1, ubConf=ubConf, ntree=500)
# let's try with a different algorithm
# library(e1071)
# results <- ubRacing(Class ~., ubIonosphere, "svm", positive=1, ubConf=ubConf)
# library(rpart)
# results <- ubRacing(Class ~., ubIonosphere, "rpart", positive=1, ubConf=ubConf)

Run the code above in your browser using DataLab