Learn R Programming

UBL (version 0.0.3)

RandOverClassif: Random over-sampling for imbalanced classification problems

Description

This function performs a random over-sampling strategy for imbalanced multiclass problems. Essentially, a percentage of cases of the class(es) selected by the user are randomly over-sampled by the introduction of replicas of examples. Alternatively, the strategy can be applied to either balance all the existing classes or to "smoothly invert" the frequency of the examples in each class.

Usage

RandOverClassif(form, dat, C.perc = "balance", repl = TRUE)

Arguments

form
A formula describing the prediction problem.
dat
A data frame containing the original imbalanced data set.
C.perc
A named list containing each class name and the corresponding over-sampling percentage, greater than or equal to 1, where 1 means that no over-sampling is to be applied in the corresponding class. The user may indicate only the classes where he wants to a
repl
A boolean value controlling the possibility of having repetition of examples when choosing the examples to repeat in the over-sampled data set. Defaults to TRUE because this is a necessary condition if the selected percentage is greater than 2. This param

Value

  • The function returns a data frame with the new data set resulting from the application of the random over-sampling strategy.

Details

This function performs a random over-sampling strategy for dealing with imbalanced multiclass problems. The new examples included in the new data set are replicas of the examples already present in the original data set.

See Also

RandUnderClassif

Examples

Run this code
library(DMwR)
  data(algae)
  # classes spring and winter remain unchanged
  C.perc = list(autumn = 2, summer = 1.5, spring = 1) 
  myover.algae <- RandOverClassif(season~., algae, C.perc)
  oveBalan.algae <- RandOverClassif(season~., algae, "balance")
  oveInvert.algae <- RandOverClassif(season~., algae, "extreme")
  
  library(MASS)
  data(cats)
  myover.cats <- RandOverClassif(Sex~., cats, list(M = 1.5))
  oveBalan.cats <- RandOverClassif(Sex~., cats, "balance")
  oveInvert.cats <- RandOverClassif(Sex~., cats, "extreme")
  
  # learn a model and check results with original and over-sampled data
  library(rpart)
  idx <- sample(1:nrow(cats), as.integer(0.7 * nrow(cats)))
  tr <- cats[idx, ]
  ts <- cats[-idx, ]
  
  ctO <- rpart(Sex ~ ., tr)
  predsO <- predict(ctO, ts, type = "class")
  new.cats <- RandOverClassif(Sex~., tr, "balance")
  ct1 <- rpart(Sex ~ ., new.cats)
  preds1 <- predict(ct1, ts, type = "class")
  table(predsO, ts$Sex)   
  table(preds1, ts$Sex)

Run the code above in your browser using DataLab