Down- and Up-Sampling Imbalanced Data
downSample will randomly sample a data set so that all classes have the same frequency as the minority class.
upSample samples with replacement to make the class distributions equal
downSample(x, y, list = FALSE, yname = "Class")
upSample(x, y, list = FALSE, yname = "Class")
- a matrix or data frame of predictor variables
- a factor variable with the class memberships
- should the function return
list(x, y)or bind
TRUE, the output will be coerced to a data frame.
list = FALSE, a label for the class column
Simple random sampling is used to down-sample for the majority class(es). Note that the minority class data are left intact and that the samples will be re-ordered in the down-sampled version.
For up-sampling, all of the original data are left intact and additional samples are added to the minority classes with replacement.
- Either a data frame or a list with elements
## A ridiculous example... data(oil) table(oilType) downSample(fattyAcids, oilType) upSample(fattyAcids, oilType)