cv.ncvreg: Cross-validation for ncvreg

Description

Performs k-fold cross validation for MCP- or SCAD-penalized regression models over a grid of values for the regularization parameter lambda.

Usage

cv.ncvreg(X, y, ..., cluster, nfolds=10, seed, cv.ind, returnY=FALSE,
trace=FALSE)

Arguments

The design matrix, without an intercept, as in ncvreg.

The response vector, as in ncvreg.

...

Additional arguments to ncvreg.

cluster

cv.ncvreg can be run in parallel across a cluster using the parallel package. The cluster must be set up in advance using the makeCluster function from that pacakge. The cluster must then be passed to cv.ncvreg (see example).

nfolds

The number of cross-validation folds. Default is 10.

cv.ind

Which fold each observation belongs to. By default the observations are randomly assigned by cv.ncvreg.

seed

You may set the seed of the random number generator in order to obtain reproducible results.

returnY

Should cv.ncvreg return the fitted values from the cross-validation folds? Default is FALSE; if TRUE, this will return a matrix in which the element for row i, column j is the fitted value for observation i from the fold in which observation i was excluded from the fit, at the jth value of lambda.

trace

If set to TRUE, cv.ncvreg will inform the user of its progress by announcing the beginning of each CV fold. Default is FALSE.

Value

An object with S3 class "cv.ncvreg" containing:

Details

The function calls ncvreg nfolds times, each time leaving out 1/nfolds of the data. The cross-validation error is based on the residual sum of squares when family="gaussian" and the binomial deviance when family="binomial" or family="poisson". For family="binomial" models, the cross-validation fold assignments are balanced across the 0/1 outcomes, so that each fold has the same proportion of 0/1 outcomes (or as close to the same proportion as it is possible to achieve if cases do not divide evenly).

References

Breheny, P. and Huang, J. (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Statist., 5: 232-253.

Examples

Run this code

data(prostate)
X <- as.matrix(prostate[,1:8])
y <- prostate$lpsa

cvfit <- cv.ncvreg(X, y)
plot(cvfit)
summary(cvfit)

fit <- cvfit$fit
plot(fit)
beta <- fit$beta[,cvfit$min]

## requires loading the parallel package
## Not run: 
# library(parallel)
# cl <- makeCluster(4)
# cvfit <- cv.ncvreg(X, y, cluster=cl, nfolds=length(y))## End(Not run)

Run the code above in your browser using DataLab