Learn R Programming

simFrame (version 0.5.0)

contaminate: Contaminate data

Description

Generic function for contaminating data.

Usage

contaminate(x, control, ...)

## S3 method for class 'data.frame,ContControl': contaminate(x, control, i)

Arguments

x
the data to be contaminated.
control
a control object of a class inheriting from the virtual class "VirtualContControl" or a character string specifying such a control class (the default being "DCARContControl").
i
an integer giving the element of the slot epsilon of control to be used as contamination level.
...
if control is a character string or missing, the slots of the control object may be supplied as additional arguments. See "DCARContControl" and "DARContControl

Value

  • A data.frame containing the contaminated data. In addition, the column ".contaminated", which consists of logicals indicating the contaminated observations, is added to the data.frame.

encoding

utf8

Details

With the control classes implemented in simFrame, contamination is modeled as a two-step process. The first step is to select observations to be contaminated, the second is to model the distribution of the outliers. In order to extend the framework by a user-defined control class "MyContControl" (which must extend "VirtualContControl"), a method contaminate(x, control, i) with signature 'data.frame, MyContControl' needs to be implemented. In case the contaminated observations need to be identified at a later stage of the simulation, e.g., if conflicts with inserting missing values should be avoided, a logical indicator variable ".contaminated" should be added to the returned data set.

References

Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The RPackage simFrame. Journal of Statistical Software, 37(3), 1--36. URL http://www.jstatsoft.org/v37/i03/.

Alfons, A., Templ, M. and Filzmoser, P. (2010) Contamination Models in the RPackage simFrame for Statistical Simulation. In Aivazian, S., Filzmoser, P. and Kharin, Y. (editors) Computer Data Analysis and Modeling: Complex Stochastic Data and Systems, volume 2, 178--181. Minsk. ISBN 978-985-476-848-9.

Béguin{Beguin}, C. and Hulliger, B. (2008) The BACON-EEM Algorithm for Multivariate Outlier Detection in Incomplete Survey Data. Survey Methodology, 34(1), 91--103.

Hulliger, B. and Schoch, T. (2009) Robust Multivariate Imputation with Survey Data. 57th Session of the International Statistical Institute, Durban.

See Also

"DCARContControl", "DARContControl", "ContControl", "VirtualContControl"

Examples

Run this code
## distributed completely at random
data(eusilcP)
sam <- draw(eusilcP[, c("id", "eqIncome")], size = 20)

# using a control object
dcarc <- ContControl(target = "eqIncome", epsilon = 0.05, 
    dots = list(mean = 5e+05, sd = 10000), type = "DCAR")
contaminate(sam, dcarc)

# supply slots of control object as arguments
contaminate(sam, target = "eqIncome", epsilon = 0.05, 
    dots = list(mean = 5e+05, sd = 10000))


## distributed at random
require(mvtnorm)
mean <- rep(0, 2)
sigma <- matrix(c(1, 0.5, 0.5, 1), 2, 2)
foo <- generate(size = 10, distribution = rmvnorm, 
    dots = list(mean = mean, sigma = sigma))

# using a control object
darc <- DARContControl(target = "V2", 
    epsilon = 0.2, fun = function(x) x * 100)
contaminate(foo, darc)

# supply slots of control object as arguments
contaminate(foo, "DARContControl", target = "V2",
    epsilon = 0.2, fun = function(x) x * 100)

Run the code above in your browser using DataLab