"hybridRepairFilter"(formula, data, ...)
"hybridRepairFilter"(x, consensus = FALSE, noiseAction = "remove", classColumn = ncol(x), ...)TRUE, consensus voting scheme is applied to identify noisy instances. Otherwise (default),
majority approach is used.filter, which is a list with seven components:
cleanData is a data frame containing the filtered dataset.
remIdx is a vector of integers indicating the indexes for
removed instances (i.e. their row number with respect to the original data frame).
repIdx is a vector of integers indicating the indexes for
repaired/relabelled instances (i.e. their row number with respect to the original data frame).
repLab is a factor containing the new labels for repaired instances.
parameters is a list containing the argument values.
call contains the original call to the filter.
extraInf is a character that includes additional interesting
information not covered by previous items.
hybridRepairFilter builds on the dataset an ensemble of four
classifiers: SVM, Neural Network, CART, KNN (combining k=1,3,5). According to their predictions and
majority or consensus voting schemes, a
subset of instances are labeled as noise. These are removed if noiseAction equals "remove", their class
is changed into the most voted among the ensemble if noiseAction equals "repair", and when the latter
is set to "hybrid", the vote of KNN decides whether remove or repair.All this procedure is repeated while the accuracy (over the original dataset) of the ensemble trained with the processed dataset increases.
# Next example is not run in order to save time
## Not run:
# data(iris)
# out <- hybridRepairFilter(iris, noiseAction = "hybrid")
# summary(out, explicit = TRUE)
# ## End(Not run)
Run the code above in your browser using DataLab