pdfos: Probability density function estimation based oversampling
Description
Generates synthetic minority examples for a numerical dataset approximating a
Gaussian multivariate distribution which best fits the minority data.
Usage
pdfos(dataset, numInstances, classAttr = "Class")
Arguments
dataset
data.frame to treat. All columns, except
classAttr one, have to be numeric or coercible to numeric.
numInstances
Integer. Number of new minority examples to generate.
classAttr
character. Indicates the class attribute from
dataset. Must exist in it.
Value
A data.frame with the same structure as dataset,
containing the generated synthetic examples.
Details
To generate the synthetic data, it approximates a normal distribution with
mean a given example belonging to the minority class, and whose variance is
the minority class variance multiplied by a constant; that constant is
computed so that it minimizes the mean integrated squared error of a Gaussian
multivariate kernel function.
References
Gao, Ming; Hong, Xia; Chen, Sheng; Harris, Chris J.; Khalaf, Emad. Pdfos: Pdf
Estimation Based Oversampling for Imbalanced Two-Class Problems.
Neurocomputing 138 (2014), p. 248<U+2013>259
Silverman, B. W. Density Estimation for Statistics and Data Analysis. Chapman
& Hall, 1986. <U+2013> ISBN 0412246201