Learn R Programming

UBL (version 0.0.3)

RandOverRegress: Random over-sampling for imbalanced regression problems

Description

This function performs a random over-sampling strategy for imbalanced regression problems. Basically a percentage of cases of the "class(es)" (bumps above a relevance threshold defined) selected by the user are randomly over-sampled. Alternatively, it can either balance all the existing "classes" (the default) or it can "smoothly invert" the frequency of the examples in each class.

Usage

RandOverRegress(form, dat, rel = "auto", thr.rel = 0.5, 
                C.perc = "balance", repl = TRUE)

Arguments

form
A formula describing the prediction problem.
dat
A data frame containing the original imbalanced data set.
rel
The relevance function which can be automatically ("auto") determined (the default) or may be provided by the user through a matrix with the interpolating points.
thr.rel
A number indicating the relevance threshold above which a case is considered as belonging to the rare "class".
C.perc
A list containing the over-sampling percentage/s to apply to all/each "class" (bump) obtained with the relevance threshold. Replicas of the examples are added randomly in each "class". Moreover, different percentages may be provi
repl
A boolean value controlling the possibility of having repetition of examples when choosing the examples to repeat in the over-sampled data set. Defaults to TRUE because this is a necessary condition if the selected percentage is greater than 2. This para

Value

  • The function returns a data frame with the new data set resulting from the application of the random over-sampling strategy.

Details

This function performs a random over-sampling strategy for dealing with imbalanced regression problems. The new examples included in the new data set are randomly selected replicas of the examples already present in the original data set.

See Also

RandUnderRegress

Examples

Run this code
data(morley)

C.perc = list(2, 4)
myover <- RandOverRegress(Speed~., morley, C.perc=C.perc)
Bal <- RandOverRegress(Speed~., morley, C.perc= "balance")
Ext <- RandOverRegress(Speed~., morley, C.perc= "extreme")

Run the code above in your browser using DataLab