unbalanced (version 2.0)

ubUnder: Under-sampling

Description

The function removes randomly some instances from the majority (negative) class and keeps all instances in the minority (positive) class in order to obtain a more balanced dataset. It allows two ways to perform undersampling: i) by setting the percentage of positives wanted after undersampling (percPos method), ii) by setting the sampling rate on the negatives, (percUnder method). For percPos, "perc"has to be (N.1/N * 100)

Usage

ubUnder(X, Y, perc = 50, method = "percPos", w = NULL)

Arguments

X
the input variables of the unbalanced dataset.
Y
the response variable of the unbalanced dataset. It must be a binary factor where the majority class is coded as 0 and the minority as 1.
perc
percentage of sampling.
method
method to perform under sampling ("percPos", "percUnder").
w
weights used for sampling the majority class, if NULL all majority instances are sampled with equal weights

Value

The function returns a list:
X
input variables
Y
response variable
id.rm
index of instances removed

See Also

ubBalance

Examples

Run this code
library(unbalanced)
data(ubIonosphere)
n<-ncol(ubIonosphere)
output<-ubIonosphere$Class
input<-ubIonosphere[ ,-n]

data<-ubUnder(X=input, Y= output, perc = 40,  method = "percPos")
newData<-cbind(data$X, data$Y)

Run the code above in your browser using DataLab