cv.glmnet
Cross-validation for glmnet
Does k-fold cross-validation for glmnet, produces a plot,
and returns a value for lambda
- Keywords
- models, regression
Usage
cv.glmnet(x, y, weights, offset, lambda, type.measure, ..., nfolds, foldid, grouped)
Arguments
- x
x
matrix as inglmnet
.- y
- response
y
as inglmnet
. - weights
- Observation weights; defaults to 1 per observation
- offset
- Offset vector (matrix) as in
glmnet
- lambda
- Optional user-supplied lambda sequence; default is
NULL
, andglmnet
chooses its own sequence - ...
- Other arguments that can be passed to
glmnet
. - nfolds
- number of folds - default is 10. Although
nfolds
can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable isnfolds=3
- foldid
- an optional vector of values between 1 and
nfold
identifying what fold each observation is in. If supplied,nfold
can be missing. - type.measure
- loss to use for cross-validation. Currently five
options, not all available for all models. The default is
type.measure="deviance"
, which uses squared-error for gaussian models (a.k.atype.measure="mse"
there), deviance - grouped
- This is an experimental argument, with default
TRUE
, and can be ignored by most users. For all models except the"cox"
, this refers to computingnfolds
separate statistics, and then using their mean
Details
The function runs glmnet
nfolds
+1 times; the
first to get the lambda
sequence, and then the remainder to
compute the fit with each of the folds omitted. The error is
accumulated, and the average error and standard deviation over the
folds is computed.
Note that cv.glmnet
does NOT search for
values for alpha
. A specific value should be supplied, else
alpha=1
is assumed by default. If users would like to
cross-validate alpha
as well, they should call cv.glmnet
with a pre-computed vector foldid
, and then use this same fold vector
in separate calls to cv.glmnet
with different values of
alpha
.
Value
- an object of class
"cv.glmnet"
is returned, which is a list with the ingredients of the cross-validation fit. lambda the values of lambda
used in the fits.cvm The mean cross-validated error - a vector of length length(lambda)
.cvsd estimate of standard error of cvm
.cvup upper curve = cvm+cvsd
.cvlo lower curve = cvm-cvsd
.nzero number of non-zero coefficients at each lambda
.name a text string indicating type of measure (for plotting purposes). glmnet.fit a fitted glmnet object for the full data. lambda.min value of lambda
that gives minimumcvm
.lambda.1se largest value of lambda
such that error is within 1 standard error of the minimum.
References
Friedman, J., Hastie, T. and Tibshirani, R. (2008)
Regularization Paths for Generalized Linear Models via Coordinate
Descent,
See Also
glmnet
and plot
, predict
, and coef
methods for "cv.glmnet"
object.
Examples
set.seed(1010)
n=1000;p=100
nzc=trunc(p/10)
x=matrix(rnorm(n*p),n,p)
beta=rnorm(nzc)
fx= x[,seq(nzc)] %*% beta
eps=rnorm(n)*5
y=drop(fx+eps)
px=exp(fx)
px=px/(1+px)
ly=rbinom(n=length(px),prob=px,size=1)
set.seed(1011)
cvob1=cv.glmnet(x,y)
plot(cvob1)
coef(cvob1)
predict(cvob1,newx=x[1:5,], s="lambda.min")
title("Gaussian Family",line=2.5)
set.seed(1011)
cvob1a=cv.glmnet(x,y,type.measure="mae")
plot(cvob1a)
title("Gaussian Family",line=2.5)
set.seed(1011)
par(mfrow=c(2,2),mar=c(4.5,4.5,4,1))
cvob2=cv.glmnet(x,ly,family="binomial")
plot(cvob2)
title("Binomial Family",line=2.5)
frame()
set.seed(1011)
cvob3=cv.glmnet(x,ly,family="binomial",type.measure="class")
plot(cvob3)
title("Binomial Family",line=2.5)
set.seed(1011)
cvob3a=cv.glmnet(x,ly,family="binomial",type.measure="auc")
plot(cvob3a)
title("Binomial Family",line=2.5)
set.seed(1011)
mu=exp(fx/10)
y=rpois(n,mu)
cvob4=cv.glmnet(x,y,family="poisson")
plot(cvob4)
title("Poisson Family",line=2.5)
# Multinomial
n=1000;p=30
nzc=trunc(p/10)
x=matrix(rnorm(n*p),n,p)
beta3=matrix(rnorm(30),10,3)
beta3=rbind(beta3,matrix(0,p-10,3))
f3=x%*% beta3
p3=exp(f3)
p3=p3/apply(p3,1,sum)
g3=rmult(p3)
set.seed(10101)
cvfit=cv.glmnet(x,g3,family="multinomial")
plot(cvfit)
title("Multinomial Family",line=2.5)
set.seed(10101)
cvfit=cv.glmnet(x,g3,family="mult",type="mse")
plot(cvfit)
title("Multinomial Family",line=2.5)
set.seed(10101)
cvfit=cv.glmnet(x,g3,family="mult",type="class")
plot(cvfit)
# Cox
beta=rnorm(nzc)
fx=x[,seq(nzc)]%*%beta/3
hx=exp(fx)
ty=rexp(n,hx)
tcens=rbinom(n=n,prob=.3,size=1)# censoring indicator
y=cbind(time=ty,status=1-tcens) # y=Surv(ty,1-tcens) with library(survival)
foldid=sample(rep(seq(10),length=n))
fit1_cv=cv.glmnet(x,y,family="cox",foldid=foldid)
plot(fit1_cv)
title("Cox Family",line=2.5)