Learn R Programming

sprm (version 1.1)

sprmsCV: Cross validation method for sprm models.

Description

k-fold cross validation for the selection of the number of components and the sparsity parameter for sparse partial robust M regression.

Usage

sprmsCV(formula, data, as, etas, nfold = 10, fun = "Hampel", probp1 = 0.95, 
hampelp2 = 0.975, hampelp3 = 0.999, center = "median", scale = "qn", 
plot = TRUE, numit = 100, prec = 0.01, alpha = 0.15)

Arguments

formula
an object of class formula.
data
a data frame or list which contains the variables given in formula.
as
a vector with positive integers, which are the number of PRMS components to be estimated in the models.
etas
vector of values for the tuning parameter for the sparsity. Values have to between 0 and 1.
nfold
the number of folds used for cross validation, default is nford=10 for 10-fold CV.
fun
an internal weighting function for case weights. Choices are "Hampel" (preferred), "Huber" or "Fair".
probp1
the 1-alpha value at which to set the first outlier cutoff for the weighting function.
hampelp2
the 1-alpha values for second cutoff. Only applies to fun="Hampel".
hampelp3
the 1-alpha values for third cutoff. Only applies to fun="Hampel".
center
type of centering of the data in form of a string that matches an R function, e.g. "mean" or "median".
scale
type of scaling for the data in form of a string that matches an R function, e.g. "sd" or "qn" or alternatively "no" for no scaling.
plot
logical, default is TRUE. If TRUE two contour plots are generated for number of components and sparsity parameter. The first contour plot shows the trimmed mean squared error of the prediction of the response (see Details) the se
numit
the number of maximal iterations for the convergence of the coefficient estimates.
prec
a value for the precision of estimation of the coefficients.
alpha
value used for alpha trimmed mean squared error, which is the cross validation criterion (see Details).

Value

  • opt.modobject of class sprm. (see sprms)
  • spearray with squared prediction error for each observation and each combination of tuning parameters
  • nzcoefarray with the number of variables in the model for each cross validation subset and each combination of tuning parameters

Details

The alpha - trimmed mean squared error of the predicted response over all observations is used as robust decision criterion to choose the optimal model.

There may occur combinations of "a" and "eta" where the model cannot be estimated. Then the function issues a warning "CV broke off at "a" and "eta"". Make sure that this does not happen close to your optimum.

References

Sven Serneels et al. (2014) Sparse partial robust M regression

See Also

prms, plot.prm, predict.prm, sprmsCV

Examples

Run this code
set.seed(50235)
U1 <- c(rep(3,20), rep(4,30))
U2 <- rep(3.5,50)
X1 <- replicate(5, U1+rnorm(50))
X2 <- replicate(20, U2+rnorm(50))
X <- cbind(X1,X2)
beta <- c(rep(1, 5), rep(0,20))
e <- c(rnorm(45,0,1.5),rnorm(5,-20,1))
y <- X%*%beta + e
d <- as.data.frame(X)
d$y <- y
res <- sprmsCV(y~., data=d, as=1:3, etas=seq(0,0.9,0.1), fun="Hampel", prec=0.1)
summary(res$opt.mod)

Run the code above in your browser using DataLab