Learn R Programming

robreg3S (version 0.3-1)

simulation-tools: Data generator for simulation study on cell- and case-wise contamination

Description

Includes the data generator for the simulation study on cell- and case-wise contamination that appears on Leung et al. (2015).

Usage

generate.randbeta(p) 

generate.cellcontam.regress(n, p, A, sigma, b, k, cp)

generate.casecontam.regress(n, p, A, sigma, b, l, k, cp)

generate.cellcontam.regress.dummies(n, p, pd, probd, A, sigma, b, k, cp)

generate.casecontam.regress.dummies(n, p, pd, probd, A, sigma, b, l, k, cp)

Value

A list with components:

x

multivariate normal sample with cell- or case-wise contamination.

y

vector of responses.

dummies

vector of dummies.

Arguments

n

integer indicating the number of observations to be generated.

p

integer indicating the number of continuous variables to be generated.

pd

integer indicating the number of dummy variables to be generated.

probd

vector of quantiles of length pd. To generate dummy variables pd continuous variables are first generated. Then, the variables are dichotomize at normal quantiles of probd.

A

a correlation matrix. See also generate.randcorr.

sigma

residual standard deviation.

b

vector of regression coefficients.

k

size of cellwise outliers and vertical outliers. See Leung et al. for details.

l

size of leverage outliers. See Leung et al. for details.

cp

proportion of cell- or case-wise contamination. Maximum of 10% for cellwise and 50% for casewise.

Author

Andy Leung jy.liang001@gmail.com, Hongyang Zhang, Ruben H. Zamar

References

Leung, A. , Zamar, R.H., and Zhang, H. Robust regression estimation and inference in the presence of cellwise and casewise contamination. arXiv:1509.02564.

See Also

generate.randcorr

Examples

Run this code
##################################################
## Cellwise contaminated data simulation 
## (continuous covariates only)
set.seed(10)
b <- 10*generate.randbeta(p=15)
A <- generate.randcorr(cond=100, p=15)
dat <- generate.cellcontam.regress(n=300, p=15, A=A, sigma=0.5, b=b, k=10, cp=0.05)

## LS
fit.LS <- lm( y ~ x, dat)
mean((coef(fit.LS)[-1] - b)^2)

## MM regression
fit.MM <- robustbase::lmrob( y ~ x, dat)
mean((coef(fit.MM)[-1] - b)^2)

## 3S regression
fit.3S <- robreg3S( y=dat$y, x=dat$x, init="imputed")
mean((coef(fit.3S)[-1] - b)^2)


if (FALSE) {
##################################################
## Casewise contaminated data simulation
## (continuous covariates only)
set.seed(10)
b <- 10*generate.randbeta(p=10)
A <- generate.randcorr(cond=100, p=10)
dat <- generate.casecontam.regress(n=200, p=10, A=A, sigma=0.5, b=b, l=8, k=10, cp=0.10)

## LS
fit.LS <- lm( y ~ x, dat)
mean((coef(fit.LS)[-1] - b)^2)

## MM regression
fit.MM <- robustbase::lmrob( y ~ x, dat)
mean((coef(fit.MM)[-1] - b)^2)

## 3S regression
fit.3S <- robreg3S( y=dat$y, x=dat$x, init="imputed")
mean((coef(fit.3S)[-1] - b)^2)

##################################################
## Cellwise contaminated data simulation 
## (continuous and dummies covariates)
set.seed(10)
b <- 10*generate.randbeta(p=15)
A <- generate.randcorr(cond=100, p=15)
dat <- generate.cellcontam.regress.dummies(n=300, p=12, pd=3, 
   probd=c(1/2,1/3,1/4), A=A, sigma=0.5, b=b, k=10, cp=0.05)

## LS
fit.LS <- lm( dat$y ~ dat$x + dat$dummies)
mean((coef(fit.LS)[-1] - b)^2)

## MM regression
fit.MM <- robustbase::lmrob( dat$y ~ dat$x + dat$dummies)
mean((coef(fit.MM)[-1] - b)^2)

## 3S regression
fit.3S <- robreg3S( y=dat$y, x=dat$x, dummies=dat$dummies, init="imputed")
mean((coef(fit.3S)[-1] - b)^2)


##################################################
## Casewise contaminated data simulation 
## (continuous and dummies covariates)
set.seed(10)
b <- 10*generate.randbeta(p=15)
A <- generate.randcorr(cond=100, p=15)
dat <- generate.casecontam.regress.dummies(n=300, p=12, pd=3, 
   probd=c(1/2,1/3,1/4), A=A, sigma=0.5, b=b, l=7, k=10, cp=0.10)

## LS
fit.LS <- lm( dat$y ~ dat$x + dat$dummies)
mean((coef(fit.LS)[-1] - b)^2)

## MM regression
fit.MM <- robustbase::lmrob( dat$y ~ dat$x + dat$dummies)
mean((coef(fit.MM)[-1] - b)^2)

## 3S regression
fit.3S <- robreg3S( y=dat$y, x=dat$x, dummies=dat$dummies, init="imputed")
mean((coef(fit.3S)[-1] - b)^2)

}

Run the code above in your browser using DataLab