Performs k-fold cross validation for MCP- or SCAD-penalized regression models over a grid of values for the regularization parameter lambda.
cv.ncvreg(
X,
y,
...,
cluster,
nfolds = 10,
seed,
fold,
returnY = FALSE,
trace = FALSE
)cv.ncvsurv(
X,
y,
...,
cluster,
nfolds = 10,
seed,
fold,
se = c("quick", "bootstrap"),
returnY = FALSE,
trace = FALSE
)
An object with S3 class cv.ncvreg
/cv.ncvsurv
containing:
The error for each value of lambda
,
averaged across the cross-validation folds.
The estimated
standard error associated with each value of for cve
.
The fold assignments for cross-validation for each observation;
note that for cv.ncvsurv
, these are in terms of the ordered
observations, not the original observations.
The sequence of regularization parameter values along which the cross-validation error was calculated.
The fitted ncvreg
/ncvsurv
object for
the whole data.
The index of lambda
corresponding to
lambda.min
.
The value of lambda
with the
minimum cross-validation error.
The deviance for the
intercept-only model. If you have supplied your own lambda
sequence,
this quantity may not be meaningful.
The estimated bias of the minimum cross-validation error, as in Tibshirani RJ and Tibshirani R (2009), "A Bias Correction for the Minimum Error Rate in Cross-Validation", Ann. Appl. Stat. 3:822-829.
If family="binomial"
, the
cross-validation prediction error for each value of lambda
.
If returnY=TRUE
, the matrix of cross-validated fitted values
(see above).
The design matrix, without an intercept, as in
ncvreg
/ncvsurv
.
The response vector, as in ncvreg
/ncvsurv
.
Additional arguments to ncvreg
/ncvsurv
.
cv.ncvreg
and cv.ncvsurv
can be run in parallel
across a cluster using the parallel
package. The cluster must be set
up in advance using the makeCluster
function from that pacakge. The
cluster must then be passed to cv.ncvreg
/cv.ncvsurv
(see
example).
The number of cross-validation folds. Default is 10.
You may set the seed of the random number generator in order to obtain reproducible results.
Which fold each observation belongs to. By default the observations are randomly assigned.
Should cv.ncvreg
/cv.ncvsurv
return the linear
predictors from the cross-validation folds? Default is FALSE; if TRUE, this
will return a matrix in which the element for row i, column j is the fitted
value for observation i from the fold in which observation i was excluded
from the fit, at the jth value of lambda. NOTE: For cv.ncvsurv
, the
rows of Y
are ordered by time on study, and therefore will not
correspond to the original order of observations pased to cv.ncvsurv
.
If set to TRUE, inform the user of progress by announcing the beginning of each CV fold. Default is FALSE.
For cv.ncvsurv
, the method by which the cross-valiation
standard error (CVSE) is calculated. The 'quick' approach is based on a
rough approximation, but can be calculated more or less instantly. The
'bootstrap' approach is more accurate, but requires additional computing
time.
Patrick Breheny; Grant Brown helped with the parallelization support
The function calls ncvreg
/ncvsurv
nfolds
times, each
time leaving out 1/nfolds
of the data. The cross-validation error is
based on the deviance;
see here for more details.
For family="binomial"
models, the cross-validation fold assignments
are balanced across the 0/1 outcomes, so that each fold has the same
proportion of 0/1 outcomes (or as close to the same proportion as it is
possible to achieve if cases do not divide evenly).
For Cox models, cv.ncvsurv
uses the approach of calculating the full
Cox partial likelihood using the cross-validated set of linear predictors.
Other approaches to cross-validation for the Cox regression model have been
proposed in the literature; the strengths and weaknesses of the various
methods for penalized regression in the Cox model are the subject of current
research. A simple approximation to the standard error is provided,
although an option to bootstrap the standard error (se='bootstrap'
)
is also available.
Breheny P and Huang J. (2011) Coordinate descentalgorithms for nonconvex penalized regression, with applications to biological feature selection. Annals of Applied Statistics, 5: 232-253. c("\Sexpr[results=rd]tools:::Rd_expr_doi(\"#1\")", "10.1214/10-AOAS388")tools:::Rd_expr_doi("10.1214/10-AOAS388")
ncvreg
, plot.cv.ncvreg
,
summary.cv.ncvreg
data(Prostate)
cvfit <- cv.ncvreg(Prostate$X, Prostate$y)
plot(cvfit)
summary(cvfit)
fit <- cvfit$fit
plot(fit)
beta <- fit$beta[,cvfit$min]
## requires loading the parallel package
if (FALSE) {
library(parallel)
X <- Prostate$X
y <- Prostate$y
cl <- makeCluster(4)
cvfit <- cv.ncvreg(X, y, cluster=cl, nfolds=length(y))}
# Survival
data(Lung)
X <- Lung$X
y <- Lung$y
cvfit <- cv.ncvsurv(X, y)
summary(cvfit)
plot(cvfit)
plot(cvfit, type="rsq")
Run the code above in your browser using DataLab