Learn R Programming

themis (version 0.2.0)

nearmiss: Remove Points Near Other Classes

Description

Generates synthetic positive instances using nearmiss algorithm.

Usage

nearmiss(df, var, k = 5, under_ratio = 1)

Arguments

df

data.frame or tibble. Must have 1 factor variable and remaining numeric variables.

var

Character, name of variable containing factor variable.

k

An integer. Number of nearest neighbor that are used to generate the new examples of the minority class.

under_ratio

A numeric value for the ratio of the minority-to-majority frequencies. The default value (1) means that all other levels are sampled down to have the same frequency as the least occurring level. A value of 2 would mean that the majority levels will have (at most) (approximately) twice as many rows than the minority level.

Value

A data.frame or tibble, depending on type of df.

Details

All columns used in this function must be numeric with no missing data.

References

Inderjeet Mani and I Zhang. knn approach to unbalanced data distributions: a case study involving information extraction. In Proceedings of workshop on learning from imbalanced datasets, 2003.

See Also

step_nearmiss() for step function of this method

Other Direct Implementations: adasyn(), bsmote(), smotenc(), smote()

Examples

Run this code
# NOT RUN {
circle_numeric <- circle_example[, c("x", "y", "class")]

res <- nearmiss(circle_numeric, var = "class")

res <- nearmiss(circle_numeric, var = "class", k = 10)

res <- nearmiss(circle_numeric, var = "class", under_ratio = 1.5)
# }

Run the code above in your browser using DataLab