nearmiss

data.frame or tibble. Must have 1 factor variable and remaining
numeric variables.

Character, name of variable containing factor variable.

An integer. Number of nearest neighbor that are used
to generate the new examples of the minority class.

A numeric value for the ratio of the
minority-to-majority frequencies. The default value (1) means
that all other levels are sampled down to have the same
frequency as the least occurring level. A value of 2 would mean
that the majority levels will have (at most) (approximately)
twice as many rows than the minority level.

under_ratio

Generates synthetic positive instances using nearmiss algorithm.

A dataset with an uneven number of cases in each class is
said to be unbalanced. Many models produce a subpar performance on
unbalanced datasets. A dataset can be balanced by increasing the
number of minority cases using SMOTE 2011 <arXiv:1106.1813>,
BorderlineSMOTE 2005 <doi:10.1007/11538059_91> and ADASYN 2008
<https://ieeexplore.ieee.org/document/4633969>. Or by decreasing the
number of majority cases using NearMiss 2003
<https://www.site.uottawa.ca/~nat/Workshop2003/jzhang.pdf> or Tomek
link removal 1976 <https://ieeexplore.ieee.org/document/4309452>.

nearmiss: Remove Points Near Other Classes

Description

Usage

Arguments

Value

Details

References

See Also

Examples