ML1:
Performance of 6 different supervised classification algorithms on eight noisy datasets (see references)
Description
Dataset with the test accuracy of 6 supervised classification algorithms on eight noisy datasets.
The way noise is introduced in originally clear datasets can be adjusted according to some parameters such as the noise type
(attribute noise versus class noise) and the noise ratio.
Format
A data frame with 52800 observations on the following 6 variables.
Algorithm
- A factor with 6 levels:
1-NN, 3-NN, 5-NN, C4.5, RIPPER, SVM
that correspond to 6 different supervised classification algorithms. Dataset
- A factor with 8 levels:
autos, balanced, cleveland, ecoli, ionosphere, pima,
vehicle
corresponding to the
names of eight datasets in which noise has been introduced artificially. Noise type
- A factor with 4 levels:
ATT_GAUS, ATT_RAND, CLA_PAIR, CLA_RAND
that correspond to
the type of noise introduced: ATT_* to denote noise added to (a percentage of) the attributes of the instance (either in a gaussian or
uniformly random way), and CLA_* to denote noise which modifies the class of (a percentage of) the instances of the dataset
(either by any other class at random, as in CLA_RAND, or by replacing the label of only a percentage of the examples of the majority class by
the label of the second-majority class as in CLA_PAIR). Noise ratio
- A real number with the ratio of attributes affected by noise (for
ATT_GAUS
and ATT_RAND
), or
the ratio of examples within the global dataset affected by a class error (for CLA_PAIR
and CLA_RAND
). Fold
- An integer number (between 1 and 25) associated with the repetition of the experiment. Recall that test results were obtained by
repeating five independent times a complete 5-fold Cross Validation process.
Performance
- Real number between 0 and 1 with the accuracy (in percentage) of the classifier over the test examples.
Source
J.A. Saez, M.Galar, J.Luengo, F.Herrera, Tackling the Problem of Classification
with Noisy Data using Multiple Classifier Systems: Analysis of the Performance and Robustness.
Information Sciences, 247 (2013) 1-20.References
Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer (2006).