Learn R Programming

UBL (version 0.0.3)

ImpSampRegress: Importance Sampling algorithm for imbalanced regression problems

Description

This function handles imbalanced regression problems using the relevance function provided to re-sample the data set. The relevance function is used to introduce replicas of the most important examples and to remove the least important examples.

Usage

ImpSampRegress(form, dat, rel = "auto", thr.rel = NA, 
               C.perc = "balance", O = 0.5, U = 0.5)

Arguments

form
A formula describing the prediction problem
dat
A data frame containing the original (unbalanced) data set
rel
The relevance function which can be automatically ("auto") determined (the default) or may be provided by the user through a matrix with interpolating points.
thr.rel
The default is NA which means that no threshold is used when performing the over/under-sampling. In this case, the over-sampling is performed by assigning a higher probability for selecting an example to the examples with higher relevance. On the other ha
C.perc
A list containing the percentage(s) of under- or/and over-sampling to apply to each "class" obtained with the threshold. The over-sampling percentage means that the examples above the threshold are increased by this percentage. Th
O
A number expressing the importance given to over-sampling when the thr.rel parameter is NA. When O increases the number of examples to include during the over-sampling step also increases. Default to 0.5.
U
A number expressing the importance given to under-sampling when the thr.rel parameter is NA. When U increases, the number of examples selected during the under-sampling step also increases. Defaults to 0.5.

Value

  • The function returns a data frame with the new data set resulting from the application of the importance sampling strategy.

See Also

RandUnderRegress, RandOverRegress

Examples

Run this code
library(DMwR)
  data(algae)
  clean.algae <- algae[complete.cases(algae), ]
  # defining a threshold on the relevance
  IS.ext <-ImpSampRegress(a7~., clean.algae, rel = "auto", 
                          thr.rel = 0.7, C.perc = "extreme")
  IS.bal <-ImpSampRegress(a7~., clean.algae, rel = "auto", thr.rel = 0.7,
                          C.perc = "balance")
  myIS <-ImpSampRegress(a7~., clean.algae, rel = "auto", thr.rel = 0.7,
                        C.perc = list(0.2, 6))
  # neither threshold nor C.perc defined
  IS.auto <- ImpSampRegress(a7~., clean.algae, rel = "auto")

Run the code above in your browser using DataLab