imbalance (version 0.1.1)

wracog: Wrapper for rapidly converging Gibbs algorithm.

Description

Generates synthetic minority examples by approximating their probability distribution until sensitivity of wrapper over validation cannot be further improved. Works only on discrete numeric datasets.

Usage

wracog(train, validation, wrapper, slideWin = 10, threshold = 0.02,
  classAttr = "Class", ...)

Arguments

train

data.frame. A initial dataset to generate first model. All columns, except classAttr one, have to be numeric or coercible to numeric.

validation

data.frame. A dataset to compare results of consecutive classifiers. Must have the same structure of train.

wrapper

An S3 object. There must exist a method trainWrapper implemented for the class of the object, and a predict method implemented for the class of the model returned by trainWrapper.

slideWin

Number of last sensitivities to take into account to meet the stopping criteria. By default, 10.

threshold

Threshold that the last slideWin sensitivities mean should reach. By default, 0.02.

classAttr

character. Indicates the class attribute from train and validation. Must exist in them.

...

further arguments for wrapper.

Value

A data.frame with the same structure as train, containing the generated synthetic examples.

Details

Until the last slideWin executions of wrapper over validation dataset reach a mean sensitivity lower than threshold, the algorithm keeps generating samples using Gibbs Sampler, and adding misclassified samples with respect to a model generated by a former train, to the train dataset. Initial model is built on initial train.

References

Das, Barnan; Krishnan, Narayanan C.; Cook, Diane J. Racog and Wracog: Two Probabilistic Oversampling Techniques. IEEE Transactions on Knowledge and Data Engineering 27(2015), Nr. 1, p. 222<U+2013>234.

Examples

Run this code
# NOT RUN {
data(haberman)
myWrapper <- structure(list(), class="C50Wrapper")
trainWrapper.C50Wrapper <- function(wrapper, train, trainClass){
  C50::C5.0(train, trainClass)
}

trainFold <- sample(1:nrow(haberman), nrow(haberman)/2, FALSE)
newSamples <- wracog(haberman[trainFold, ], haberman[-trainFold, ],
                     myWrapper, classAttr = "Class")

# }

Run the code above in your browser using DataCamp Workspace