sco_con_ln: Score-based confidence label noise

Description

Introduction of Score-based confidence label noise into a classification dataset.

Usage

# S3 method for default
sco_con_ln(x, y, level, sortid = TRUE, ...)
# S3 method for formula
sco_con_ln(formula, data, ...)

Value

An object of class ndmodel with elements:

xnoise: a data frame with the noisy input attributes.
ynoise: a factor vector with the noisy output class.
numnoise: an integer vector with the amount of noisy samples per class.
idnoise: an integer vector list with the indices of noisy samples.
numclean: an integer vector with the amount of clean samples per class.
idclean: an integer vector list with the indices of clean samples.
distr: an integer vector with the samples per class in the original data.
model: the full name of the noise introduction model used.
param: a list of the argument values.
call: the function call.

Arguments

x: a data frame of input attributes.
y: a factor vector with the output class of each sample.
level: a double in [0,1] with the noise level to be introduced.
sortid: a logical indicating if the indices must be sorted at the output (default: TRUE).
...: other options to pass to the function.
formula: a formula with the output class and, at least, one input attribute.
data: a data frame in which to interpret the variables in the formula.

Details

Score-based confidence label noise follows the intuition that hard samples are more likely to be mislabeled. Given the confidence per class of each sample, if it is predicted with a different class with a high probability, it means that it is hard to clearly distinguish the sample from this class. The confidence information is used to compute a mislabeling score for each sample and its potential noisy label. Finally, (level·100)% of the samples with the highest mislabeling scores are chosen as noisy.

References

P. Chen, J. Ye, G. Chen, J. Zhao, and P. Heng. Beyond class-conditional assumption: A primary attempt to combat instance-dependent label noise. In Proc. 35th AAAI Conference on Artificial Intelligence, pages 11442-11450, 2021. url:https://ojs.aaai.org/index.php/AAAI/article/view/17363.

Examples

Run this code

# load the dataset
data(iris2D)

# usage of the default method
set.seed(9)
outdef <- sco_con_ln(x = iris2D[,-ncol(iris2D)], y = iris2D[,ncol(iris2D)], level = 0.1)

# show results
summary(outdef, showid = TRUE)
plot(outdef)

# usage of the method for class formula
set.seed(9)
outfrm <- sco_con_ln(formula = Species ~ ., data = iris2D, level = 0.1)

# check the match of noisy indices
identical(outdef$idnoise, outfrm$idnoise)

Run the code above in your browser using DataLab