cv.glmnet

0th

Percentile

Cross-validation for glmnet

Does k-fold cross-validation for glmnet, produces a plot, and returns a value for lambda

Keywords
models, regression
Usage
cv.glmnet(x, y, weights, offset, lambda, type.measure, ..., nfolds, foldid, grouped)
Arguments
x
x matrix as in glmnet.
y
response y as in glmnet.
weights
Observation weights; defaults to 1 per observation
offset
Offset vector (matrix) as in glmnet
lambda
Optional user-supplied lambda sequence; default is NULL, and glmnet chooses its own sequence
...
Other arguments that can be passed to glmnet.
nfolds
number of folds - default is 10. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is nfolds=3
foldid
an optional vector of values between 1 and nfold identifying what fold each observation is in. If supplied, nfold can be missing.
type.measure
loss to use for cross-validation. Currently five options, not all available for all models. The default is type.measure="deviance", which uses squared-error for gaussian models (a.k.a type.measure="mse" there), deviance
grouped
This is an experimental argument, with default TRUE, and can be ignored by most users. For all models except the "cox", this refers to computing nfolds separate statistics, and then using their mean
Details

The function runs glmnet nfolds+1 times; the first to get the lambda sequence, and then the remainder to compute the fit with each of the folds omitted. The error is accumulated, and the average error and standard deviation over the folds is computed. Note that cv.glmnet does NOT search for values for alpha. A specific value should be supplied, else alpha=1 is assumed by default. If users would like to cross-validate alpha as well, they should call cv.glmnet with a pre-computed vector foldid, and then use this same fold vector in separate calls to cv.glmnet with different values of alpha.

Value

  • an object of class "cv.glmnet" is returned, which is a list with the ingredients of the cross-validation fit.
  • lambdathe values of lambda used in the fits.
  • cvmThe mean cross-validated error - a vector of length length(lambda).
  • cvsdestimate of standard error of cvm.
  • cvupupper curve = cvm+cvsd.
  • cvlolower curve = cvm-cvsd.
  • nzeronumber of non-zero coefficients at each lambda.
  • namea text string indicating type of measure (for plotting purposes).
  • glmnet.fita fitted glmnet object for the full data.
  • lambda.minvalue of lambda that gives minimum cvm.
  • lambda.1selargest value of lambda such that error is within 1 standard error of the minimum.

References

Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization Paths for Generalized Linear Models via Coordinate Descent, http://www.stanford.edu/~hastie/Papers/glmnet.pdf Journal of Statistical Software, Vol. 33(1), 1-22 Feb 2010 http://www.jstatsoft.org/v33/i01/ Simon, N., Friedman, J., Hastie, T., Tibshirani, R. (2011) Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent, Journal of Statistical Software, Vol. 39(5) 1-13 http://www.jstatsoft.org/v39/i05/

See Also

glmnet and plot, predict, and coef methods for "cv.glmnet" object.

Aliases
  • cv.glmnet
Examples
set.seed(1010)
n=1000;p=100
nzc=trunc(p/10)
x=matrix(rnorm(n*p),n,p)
beta=rnorm(nzc)
fx= x[,seq(nzc)] %*% beta
eps=rnorm(n)*5
y=drop(fx+eps)
px=exp(fx)
px=px/(1+px)
ly=rbinom(n=length(px),prob=px,size=1)
set.seed(1011)
cvob1=cv.glmnet(x,y)
plot(cvob1)
coef(cvob1)
predict(cvob1,newx=x[1:5,], s="lambda.min")
title("Gaussian Family",line=2.5)
set.seed(1011)
cvob1a=cv.glmnet(x,y,type.measure="mae")
plot(cvob1a)
title("Gaussian Family",line=2.5)
set.seed(1011)
par(mfrow=c(2,2),mar=c(4.5,4.5,4,1))
cvob2=cv.glmnet(x,ly,family="binomial")
plot(cvob2)
title("Binomial Family",line=2.5)
frame()
set.seed(1011)
cvob3=cv.glmnet(x,ly,family="binomial",type.measure="class")
plot(cvob3)
title("Binomial Family",line=2.5)
set.seed(1011)
cvob3a=cv.glmnet(x,ly,family="binomial",type.measure="auc")
plot(cvob3a)
title("Binomial Family",line=2.5)
set.seed(1011)
mu=exp(fx/10)
y=rpois(n,mu)
cvob4=cv.glmnet(x,y,family="poisson")
plot(cvob4)
title("Poisson Family",line=2.5)
# Multinomial
n=1000;p=30
nzc=trunc(p/10)
x=matrix(rnorm(n*p),n,p)
beta3=matrix(rnorm(30),10,3)
beta3=rbind(beta3,matrix(0,p-10,3))
f3=x%*% beta3
p3=exp(f3)
p3=p3/apply(p3,1,sum)
g3=rmult(p3)
set.seed(10101)
cvfit=cv.glmnet(x,g3,family="multinomial")
plot(cvfit)
title("Multinomial Family",line=2.5)
set.seed(10101)
cvfit=cv.glmnet(x,g3,family="mult",type="mse")
plot(cvfit)
title("Multinomial Family",line=2.5)
set.seed(10101)
cvfit=cv.glmnet(x,g3,family="mult",type="class")
plot(cvfit)
# Cox
beta=rnorm(nzc)
fx=x[,seq(nzc)]%*%beta/3
hx=exp(fx)
ty=rexp(n,hx)
tcens=rbinom(n=n,prob=.3,size=1)# censoring indicator
y=cbind(time=ty,status=1-tcens) # y=Surv(ty,1-tcens) with library(survival)
foldid=sample(rep(seq(10),length=n))
fit1_cv=cv.glmnet(x,y,family="cox",foldid=foldid)
plot(fit1_cv)
title("Cox Family",line=2.5)
Documentation reproduced from package glmnet, version 1.7.4, License: GPL-2

Community examples

Looks like there are no examples yet.