Cross-validation for glmnet

Does k-fold cross-validation for glmnet, produces a plot, and returns a value for lambda

models, regression
cv.glmnet(x, y, weights, offset, lambda, type.measure, nfolds, foldid, grouped, keep, parallel, ...)
x matrix as in glmnet.
response y as in glmnet.
Observation weights; defaults to 1 per observation
Offset vector (matrix) as in glmnet
Optional user-supplied lambda sequence; default is NULL, and glmnet chooses its own sequence
number of folds - default is 10. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is nfolds=3
an optional vector of values between 1 and nfold identifying what fold each observation is in. If supplied, nfold can be missing.
loss to use for cross-validation. Currently five options, not all available for all models. The default is type.measure="deviance", which uses squared-error for gaussian models (a.k.a type.measure="mse" there), deviance for logistic and poisson regression, and partial-likelihood for the Cox model. type.measure="class" applies to binomial and multinomial logistic regression only, and gives misclassification error. type.measure="auc" is for two-class logistic regression only, and gives area under the ROC curve. type.measure="mse" or type.measure="mae" (mean absolute error) can be used by all models except the "cox"; they measure the deviation from the fitted mean to the response.
This is an experimental argument, with default TRUE, and can be ignored by most users. For all models except the "cox", this refers to computing nfolds separate statistics, and then using their mean and estimated standard error to describe the CV curve. If grouped=FALSE, an error matrix is built up at the observation level from the predictions from the nfold fits, and then summarized (does not apply to type.measure="auc"). For the "cox" family, grouped=TRUE obtains the CV partial likelihood for the Kth fold by subtraction; by subtracting the log partial likelihood evaluated on the full dataset from that evaluated on the on the (K-1)/K dataset. This makes more efficient use of risk sets. With grouped=FALSE the log partial likelihood is computed only on the Kth fold
If keep=TRUE, a prevalidated array is returned containing fitted values for each observation and each value of lambda. This means these fits are computed with this observation and the rest of its fold omitted. The folid vector is also returned. Default is keep=FALSE
If TRUE, use parallel foreach to fit each fold. Must register parallel before hand, such as doMC or others. See the example below.
Other arguments that can be passed to glmnet

The function runs glmnet nfolds+1 times; the first to get the lambda sequence, and then the remainder to compute the fit with each of the folds omitted. The error is accumulated, and the average error and standard deviation over the folds is computed. Note that cv.glmnet does NOT search for values for alpha. A specific value should be supplied, else alpha=1 is assumed by default. If users would like to cross-validate alpha as well, they should call cv.glmnet with a pre-computed vector foldid, and then use this same fold vector in separate calls to cv.glmnet with different values of alpha. Note also that the results of cv.glmnet are random, since the folds are selected at random. Users can reduce this randomness by running cv.glmnet many times, and averaging the error curves.


"cv.glmnet" is returned, which is a list with the ingredients of the cross-validation fit.
the values of lambda used in the fits.
The mean cross-validated error - a vector of length length(lambda).
estimate of standard error of cvm.
upper curve = cvm+cvsd.
lower curve = cvm-cvsd.
number of non-zero coefficients at each lambda.
a text string indicating type of measure (for plotting purposes).
a fitted glmnet object for the full data.
value of lambda that gives minimum cvm.
largest value of lambda such that error is within 1 standard error of the minimum.
if keep=TRUE, this is the array of prevalidated fits. Some entries can be NA, if that and subsequent values of lambda are not reached for that fold
if keep=TRUE, the fold assignments used


Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization Paths for Generalized Linear Models via Coordinate Descent, Journal of Statistical Software, Vol. 33(1), 1-22 Feb 2010 Simon, N., Friedman, J., Hastie, T., Tibshirani, R. (2011) Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent, Journal of Statistical Software, Vol. 39(5) 1-13

See Also

glmnet and plot, predict, and coef methods for "cv.glmnet" object.

  • cv.glmnet
library(glmnet) set.seed(1010) n=1000;p=100 nzc=trunc(p/10) x=matrix(rnorm(n*p),n,p) beta=rnorm(nzc) fx= x[,seq(nzc)] %*% beta eps=rnorm(n)*5 y=drop(fx+eps) px=exp(fx) px=px/(1+px) ly=rbinom(n=length(px),prob=px,size=1) set.seed(1011) cvob1=cv.glmnet(x,y) plot(cvob1) coef(cvob1) predict(cvob1,newx=x[1:5,], s="lambda.min") title("Gaussian Family",line=2.5) set.seed(1011) cvob1a=cv.glmnet(x,y,type.measure="mae") plot(cvob1a) title("Gaussian Family",line=2.5) set.seed(1011) par(mfrow=c(2,2),mar=c(4.5,4.5,4,1)) cvob2=cv.glmnet(x,ly,family="binomial") plot(cvob2) title("Binomial Family",line=2.5) frame() set.seed(1011) cvob3=cv.glmnet(x,ly,family="binomial",type.measure="class") plot(cvob3) title("Binomial Family",line=2.5) set.seed(1011) cvob3a=cv.glmnet(x,ly,family="binomial",type.measure="auc") plot(cvob3a) title("Binomial Family",line=2.5) set.seed(1011) mu=exp(fx/10) y=rpois(n,mu) cvob4=cv.glmnet(x,y,family="poisson") plot(cvob4) title("Poisson Family",line=2.5) # Multinomial n=500;p=30 nzc=trunc(p/10) x=matrix(rnorm(n*p),n,p) beta3=matrix(rnorm(30),10,3) beta3=rbind(beta3,matrix(0,p-10,3)) f3=x%*% beta3 p3=exp(f3) p3=p3/apply(p3,1,sum) g3=rmult(p3) set.seed(10101) cvfit=cv.glmnet(x,g3,family="multinomial") plot(cvfit) title("Multinomial Family",line=2.5) # Cox beta=rnorm(nzc) fx=x[,seq(nzc)]%*%beta/3 hx=exp(fx) ty=rexp(n,hx) tcens=rbinom(n=n,prob=.3,size=1)# censoring indicator y=cbind(time=ty,status=1-tcens) # y=Surv(ty,1-tcens) with library(survival) foldid=sample(rep(seq(10),length=n)) fit1_cv=cv.glmnet(x,y,family="cox",foldid=foldid) plot(fit1_cv) title("Cox Family",line=2.5) ## Not run: # # Parallel # require(doMC) # registerDoMC(cores=4) # x = matrix(rnorm(1e5 * 100), 1e5, 100) # y = rnorm(1e5) # system.time(cv.glmnet(x,y)) # system.time(cv.glmnet(x,y,parallel=TRUE)) # ## End(Not run)
Documentation reproduced from package glmnet, version 2.0-5, License: GPL-2

Community examples

Looks like there are no examples yet.