Performs oversampling by creating new instances.
smote(
Variables,
Classes,
subset_use = NULL,
k = 5,
use_nearest = TRUE,
proportions = 0.9,
equalise_with_undersampling = FALSE,
safe = FALSE
)
a list containing new independent variables data.frame and new class labels
the data.frame of independent variables that should be used to create new instances
the class labels in the prediction problem
a specific subset only is used for the oversampling. If NULL, everything is used.
the number of neigbours for generation
should only the nearest neighbours be used? (very slow)
to which proportion (of the biggest class) should the classes be equalized
should additional undersampling be performed?
should a safe version of SMOTE be used?
Ilya Kozlovskiy, Konstantin Hopf konstantin.hopf@uni-bamberg.de
SMOTE is used to generate synthetic datapoints of a smaller class, for example to overcome the problem of imbalanced classes in classification.