cv.ncvreg: Cross-validation for ncvreg

Description

Performs k-fold cross validation for MCP- or SCAD-penalized regression models over a grid of values for the regularization parameter lambda.

Usage

cv.ncvreg(X, y, ..., cluster, nfolds=10, seed, cv.ind, trace=FALSE)

Arguments

The design matrix, without an intercept, as in ncvreg.

The response vector, as in ncvreg.

...

Additional arguments to ncvreg.

cluster

cv.ncvreg can be run in parallel across a cluster using the parallel package. The cluster must be set up in advance using the makeCluster function from that pacakge. The cluster must then be passed to <

nfolds

The number of cross-validation folds. Default is 10.

cv.ind

Which fold each observation belongs to. By default the observations are randomly assigned by cv.ncvreg.

seed

You may set the seed of the random number generator in order to obtain reproducible results.

trace

If set to TRUE, cv.ncvreg will inform the user of its progress by announcing the beginning of each CV fold. Default is FALSE.

Value

An object with S3 class "cv.ncvreg" containing:
cveThe error for each value of lambda, averaged across the cross-validation folds.
cvseThe estimated standard error associated with each value of for cve.
lambdaThe sequence of regularization parameter values along which the cross-validation error was calculated.
fitThe fitted ncvreg object for the whole data.
minThe index of lambda corresponding to lambda.min.
lambda.minThe value of lambda with the minimum cross-validation error.
null.devThe deviance for the intercept-only model.
peIf family="binomial", the cross-validation prediction error for each value of lambda.

Details

The function calls ncvreg nfolds times, each time leaving out 1/nfolds of the data. The cross-validation error is based on the residual sum of squares when family="gaussian" and the binomial deviance when family="binomial" or family="poisson". For family="binomial" models, the cross-validation fold assignments are balanced across the 0/1 outcomes, so that each fold has the same proportion of 0/1 outcomes (or as close to the same proportion as it is possible to achieve if cases do not divide evenly).

References

Breheny, P. and Huang, J. (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Statist., 5: 232-253.

Examples

Run this code

data(prostate)
X <- as.matrix(prostate[,1:8])
y <- prostate$lpsa

cvfit <- cv.ncvreg(X, y)
plot(cvfit)
summary(cvfit)

fit <- cvfit$fit
plot(fit)
beta <- fit$beta[,cvfit$min]

## requires loading the parallel package
library(parallel)
cl <- makeCluster(4)
cvfit <- cv.ncvreg(X, y, cluster=cl, nfolds=length(y))

Run the code above in your browser using DataLab