smote

(<a rd-options="" href="/link/Task?package=mlr&version=2.13" data-mini-rdoc="mlr::Task">Task</a>)
The task.

task

(<code>numeric(1)</code>)
Factor to upsample the smaller class.
Must be between 1 and <code>Inf</code>,
where 1 means no oversampling and 2 would mean doubling the class size.

rate

(<code>integer(1)</code>)
Number of nearest neighbors to consider.
Default is 5.

(<code>integer(1)</code>)
Standardize input variables before calculating the nearest neighbors
for data sets with numeric input variables only. For mixed variables
(numeric and factor) the gower distance is used and variables are
standardized anyway.
Default is <code>TRUE</code>.

standardize

(<code>integer(1)</code>)
Use an alternative logic for selection of minority class observations.
Instead of sampling a minority class element AND one of its nearest
neighbors, each minority class element is taken multiple times (depending
on rate) for the interpolation and only the corresponding nearest neighbor
is sampled.
Default is <code>FALSE</code>.

alt.logic

In each iteration, samples one minority class element x1, then one of x1's nearest neighbors: x2.
Both points are now interpolated / convex-combined, resulting in a new virtual data point x3
for the minority class.
The method handles factor features, too. The gower distance is used for nearest neighbor
calculation, see <a rd-options="cluster:daisy" href="/link/cluster%3A%3Adaisy?package=mlr&version=2.13&to=cluster%3Adaisy" data-mini-rdoc="cluster:daisy::cluster::daisy">cluster::daisy</a>.
For interpolation, the new factor level for x3
is sampled from the two given levels of x1 and x2 per feature.

Interface to a large number of classification and regression
techniques, including machine-readable parameter descriptions. There is
also an experimental extension for survival analysis, clustering and
general, example-specific cost-sensitive learning. Generic resampling,
including cross-validation, bootstrapping and subsampling. Hyperparameter
tuning with modern optimization techniques, for single- and multi-objective
problems. Filter and wrapper methods for feature selection. Extension of
basic learners with additional operations common in machine learning, also
allowing for easy nested resampling. Most operations can be parallelized.

Bernd Bischl

Machine Learning in R

Michel Lang

Lars Kotthoff

Julia Schiffner

Jakob Richter

Zachary Jones

Giuseppe Casalicchio

Mason Gallo

Patrick Schratz

Jakob Bossek

Erich Studerus

Leonard Judt

Tobias Kuehn

Pascal Kerschke

Florian Fendt

Philipp Probst

Xudong Sun

Janek Thomas

Bruno Vieira

Laura Beggel

Quay Au

Martin Binder

Florian Pfisterer

Stefan Coors

Steve Bronder

Alexander Engelhardt

Christoph Molnar

smote function

(<a rd-options='' href='Task'>Task</a>)
The task.

In each iteration, samples one minority class element x1, then one of x1's nearest neighbors: x2.
Both points are now interpolated / convex-combined, resulting in a new virtual data point x3
for the minority class.
The method handles factor features, too. The gower distance is used for nearest neighbor
calculation, see <a rd-options='cluster:daisy' href='cluster::daisy'>cluster::daisy</a>.
For interpolation, the new factor level for x3
is sampled from the two given levels of x1 and x2 per feature.

Synthetic Minority Oversampling Technique to handle class imbalancy in binary classification. — smote

Synthetic Minority Oversampling Technique to handle class imbalancy in binary classification.

smote: Synthetic Minority Oversampling Technique to handle class imbalancy in binary classification.

Description

Usage

Arguments

Value

References

See Also