"C45robustFilter"(formula, data, ...)
"C45robustFilter"(x, classColumn = ncol(x), ...)
"C45votingFilter"(formula, data, ...)
"C45votingFilter"(x, nfolds = 10, consensus = FALSE, classColumn = ncol(x), ...)
"C45iteratedVotingFilter"(formula, data, ...)
"C45iteratedVotingFilter"(x, nfolds = 10, consensus = FALSE, classColumn = ncol(x), ...)
filter
, which is a list with seven components:
cleanData
is a data frame containing the filtered dataset.
remIdx
is a vector of integers indicating the indexes for
removed instances (i.e. their row number with respect to the original data frame).
repIdx
is a vector of integers indicating the indexes for
repaired/relabelled instances (i.e. their row number with respect to the original data frame).
repLab
is a factor containing the new labels for repaired instances.
parameters
is a list containing the argument values.
call
contains the original call to the filter.
extraInf
is a character that includes additional interesting
information not covered by previous items.
C45robustFilter
builds a C4.5 decision tree from the training data, and then
removes those instances misclassfied by this tree. The process is repeated until no instances are removed.
C45votingFilter
splits the dataset into nfolds
folds, building and testing a C4.5 tree on every
combination of nfolds
-1 folds. Thus nfolds
-1 votes are gathered
for each instance. Removal is carried out by majority or consensus voting schemes.
C45iteratedVotingFilter
somehow combines the two previous filter, since
it iterates C45votingFilter
until no more noisy instances are removed.
# Next example is not run in order to save time
## Not run:
# data(iris)
# out1 <- C45robustFilter(Species~.-Sepal.Length, iris)
# # We fix a seed since next two functions create partitions of data for the ensemble
# set.seed(1)
# out2 <- C45votingFilter(iris, consensus = TRUE)
# out3 <- C45iteratedVotingFilter(Species~., iris, nfolds = 5)
# print(out1)
# print(out2)
# print(out3)
# identical(out1$cleanData,iris[setdiff(1:nrow(iris),out1$remIdx),])
# identical(out2$cleanData,iris[setdiff(1:nrow(iris),out2$remIdx),])
# identical(out3$cleanData,iris[setdiff(1:nrow(iris),out3$remIdx),])
# ## End(Not run)
Run the code above in your browser using DataLab