A data.frame with synthetic rows appended, same columns and types as input.
Arguments
formula
A model formula target ~ predictors indicating the response and predictors.
data
A data.frame containing the variables in the model.
k
Integer, number of nearest neighbors used by SMOTE (default 5).
strategy
One of "balance" (oversample to the max class size) or "perc"
(oversample each class by a percentage). Default "balance".
perc
Numeric percentage used when strategy = "perc" (e.g., 100 means
generate as many synthetic examples as existing in the class). Ignored for "balance".
metric
Distance metric for neighbor search: one of
"euclidean", "manhattan", "chebyshev", "canberra", "overlap", "heom", "hvdm", "pnorm".
Default "euclidean".
p
Numeric p for the p-norm when metric = "pnorm"; also used implicitly for
"euclidean" (p=2) and "manhattan" (p=1). Default 2.
seed
Optional integer seed for reproducibility.
C.perc
Deprecated. Backward-compatibility alias for oversampling
control. If character "balance", mapped to strategy = "balance".
If a single numeric, mapped to strategy = "perc" and perc = C.perc.
Other forms are ignored with a warning.
Details
The function supports multi-class data. With strategy = "balance" (default),
each class is oversampled up to the size of the largest class. With
strategy = "perc", each class c is oversampled by round(n_c * perc/100).
Neighbors are computed within each class.