Learn R Programming

UBL (version 0.0.3)

OSSClassif: One-sided selection strategy for handling multiclass imbalanced problems.

Description

This function performs an adapted one-sided selection strategy for multiclass imbalanced problems.

Usage

OSSClassif(form, dat, dist = "Euclidean", p = 2, Cl = "smaller", start = "CNN")

Arguments

form
A formula describing the prediction problem.
dat
A data frame containing the original imbalanced data set.
dist
A character string indicating which distance metric to use when determining the k nearest neighbors. Defaults to "Euclidean".
p
A number indicating the value of p if the "p-norm" distance is chosen.
Cl
A character vector indicating which are the most important classes. Defaults to "smaller" which means that the smaller classes are automatically determined. In this case, all the smaller classes are those with a frequency below (nr.examples)/(nr.classes).
start
A string which determines which strategy (CNN or Tomek links) should be performed first. The existing options are "CNN" and "Tomek". The first one, "CNN", which is the default, means that CNN strategy will be performed first and Tomek links are applied af

Value

  • The function returns a data frame with the new data set resulting from the application of the selected OSS strategy.

References

Kubat, M. & Matwin, S. (1997). Addressing the Curse of Imbalanced Training Sets: One-Sided Selection Proc. of the 14th Int. Conf. on Machine Learning, Morgan Kaufmann, 179-186.

Batista, G. E.; Prati, R. C. & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data ACM SIGKDD Explorations Newsletter, ACM, 6, 20-29

See Also

TomekClassif, CNNClassif

Examples

Run this code
library(DMwR)
  data(algae)
  clean.algae <- algae[complete.cases(algae), ]
  alg1 <- OSSClassif(season~., clean.algae, dist = "HVDM", 
                     Cl = c("spring", "summer"))
  alg2 <- OSSClassif(season~., clean.algae, dist = "HEOM", 
                     Cl = c("spring", "summer"), start = "Tomek")
  alg3 <- OSSClassif(season~., clean.algae, dist = "HVDM", start = "CNN")
  alg4 <- OSSClassif(season~., clean.algae, dist = "HVDM", start = "Tomek")
  alg5 <- OSSClassif(season~., clean.algae, dist = "HEOM", Cl = "winter")
  summary(alg1$season)
  summary(alg2$season)
  summary(alg3$season)
  summary(alg4$season)
  summary(alg5$season)

Run the code above in your browser using DataLab