ML1: Performance of 6 different supervised classification algorithms on eight noisy datasets (see references)

Description

Dataset with the test accuracy of 6 supervised classification algorithms on eight noisy datasets. The way noise is introduced in originally clear datasets can be adjusted according to some parameters such as the noise type (attribute noise versus class noise) and the noise ratio.

Usage

data(ML1)

Arguments

Format

A data frame with 52800 observations on the following 6 variables.

Algorithm: A factor with 6 levels: 1-NN, 3-NN, 5-NN, C4.5, RIPPER, SVM that correspond to 6 different supervised classification algorithms.
Dataset: A factor with 8 levels: autos, balanced, cleveland, ecoli, ionosphere, pima, vehicle corresponding to the names of eight datasets in which noise has been introduced artificially.
Noise type: A factor with 4 levels: ATT_GAUS, ATT_RAND, CLA_PAIR, CLA_RAND that correspond to the type of noise introduced: ATT_* to denote noise added to (a percentage of) the attributes of the instance (either in a gaussian or uniformly random way), and CLA_* to denote noise which modifies the class of (a percentage of) the instances of the dataset (either by any other class at random, as in CLA_RAND, or by replacing the label of only a percentage of the examples of the majority class by the label of the second-majority class as in CLA_PAIR).
Noise ratio: A real number with the ratio of attributes affected by noise (for ATT_GAUS and ATT_RAND), or the ratio of examples within the global dataset affected by a class error (for CLA_PAIR and CLA_RAND).
Fold: An integer number (between 1 and 25) associated with the repetition of the experiment. Recall that test results were obtained by repeating five independent times a complete 5-fold Cross Validation process.
Performance: Real number between 0 and 1 with the accuracy (in percentage) of the classifier over the test examples.

Source

J.A. Saez, M.Galar, J.Luengo, F.Herrera, Tackling the Problem of Classification with Noisy Data using Multiple Classifier Systems: Analysis of the Performance and Robustness. Information Sciences, 247 (2013) 1-20.

References

Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer (2006).

Examples

Run this code

data(ML1)
str(ML1)
head(ML1)

Run the code above in your browser using DataLab