Learn R Programming

⚠️There's a newer version (1.0.2.1) of this package.Take me there.

imbalance

imbalance provides a set of tools to work with imbalanced datasets: novel oversampling algorithms, filtering of instances and evaluation of synthetic instances.

Installation

You can install imbalance from Github with:

# install.packages("devtools")
devtools::install_github("ncordon/imbalance")

Examples

Run pdfos algorithm on newthyroid1 imbalanced dataset and plot a comparison between attributes.

library("imbalance")
data(newthyroid1)

newSamples <- pdfos(newthyroid1, numInstances = 80)
# Join new samples with old imbalanced dataset
newDataset <- rbind(newthyroid1, newSamples)
# Plot a visual comparison between both datasets
plotComparison(newthyroid1, newDataset, attrs = names(newthyroid1)[1:3], cols = 2, classAttr = "Class")

After filtering examples with neater:

filteredSamples <- neater(newthyroid1, newSamples, iterations = 500)
#> [1] "10 samples filtered by NEATER"
filteredNewDataset <- rbind(newthyroid1, filteredSamples)
plotComparison(newthyroid1, filteredNewDataset, attrs = names(newthyroid1)[1:3])

Execute method ADASYN using the wrapper provided by the package, comparing imbalance ratios of the dataset before and after oversampling:

imbalanceRatio(glass0)
#> [1] 0.4861111
newDataset <- oversample(glass0, method = "ADASYN")
imbalanceRatio(newDataset)
#> [1] 0.9722222

Copy Link

Version

Install

install.packages('imbalance')

Monthly Downloads

8,752

Version

1.0.0

License

GPL (>= 2) | file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Ignacio Cord<c3><b3>n

Last Published

February 18th, 2018

Functions in imbalance (1.0.0)

pdfos

Probability density function estimation based oversampling
neater

Fitering of oversampled data based on non-cooperative game theory
imabalace

imabalance: A package to treat imbalanced datasets
newthyroid1

Imbalanced binary thyroid gland data
haberman

Haberman's survival data
iris0

Imbalanced binary iris dataset
ecoli1

Imbalanced binary ecoli protein localization sites
imbalanceRatio

Compute imbalance ratio of a binary dataset
mwmote

Majority weighted minority oversampling technique for imbalance dataset learning
banana

Binary banana dataset
rwo

Random walk oversampling
glass0

Imbalanced binary glass identification
racog

Rapidly converging Gibbs algorithm.
trainWrapper

Generic methods to train classifiers
wracog

Wrapper for rapidly converging Gibbs algorithm.
yeast4

Imbalanced binary yeast protein localization sites
plotComparison

Plots comparison between the original and the new balanced dataset.
wisconsin

Imbalanced binary breast cancer Wisconsin dataset
oversample

Wrapper that encapsulates a collection of algorithms to perform a class balancing preprocessing task for binary class datasets