smote: Synthetic Minority Oversampling Technique to handle class imbalancy in binary classification.
Description
In each iteration, samples one minority class element x1, then one of x1's nearest neighbors: x2.
Both points are now interpolated / convex-combined, resulting in a new virtual data point x3
for the minority class.The method handles factor features, too. The gower distance is used for nearest neighbor
calculation, see daisy
.
For interpolation, the new factor level for x3
is sampled from the two given levels of x1 and x2 per feature.
Usage
smote(task, rate, nn = 5L, standardize = TRUE, alt.logic = FALSE)
References
Chawla, N., Bowyer, K., Hall, L., & Kegelmeyer, P. (2000)
SMOTE: Synthetic Minority Over-sampling TEchnique.
In International Conference of Knowledge Based Computer Systems, pp. 46-57.
National Center for Software Technology, Mumbai, India, Allied Press.