Does k-fold cross-validation for glmnet, produces a plot,
and returns a value for `lambda`

```
cv.glmnet(x, y, weights, offset, lambda, type.measure, nfolds, foldid, grouped, keep,
parallel, ...)
```

x

`x`

matrix as in `glmnet`

.

y

response `y`

as in `glmnet`

.

weights

Observation weights; defaults to 1 per observation

offset

Offset vector (matrix) as in `glmnet`

lambda

Optional user-supplied lambda sequence; default is
`NULL`

, and `glmnet`

chooses its own sequence

nfolds

number of folds - default is 10. Although `nfolds`

can be as large as the sample size (leave-one-out CV), it is not
recommended for large datasets. Smallest value allowable is `nfolds=3`

foldid

an optional vector of values between 1 and `nfold`

identifying what fold each observation is in. If supplied,
`nfold`

can be missing.

type.measure

loss to use for cross-validation. Currently five
options, not all available for all models. The default is `type.measure="deviance"`

, which uses
squared-error for gaussian models (a.k.a `type.measure="mse"`

there), deviance
for logistic and poisson
regression, and partial-likelihood for the Cox
model. `type.measure="class"`

applies to binomial and multinomial logistic regression only,
and gives misclassification error. `type.measure="auc"`

is for
two-class logistic regression only, and gives area under the ROC
curve. `type.measure="mse"`

or `type.measure="mae"`

(mean absolute error)
can be used by all models except the `"cox"`

; they measure the
deviation from the fitted mean to the response.

grouped

This is an experimental argument, with default
`TRUE`

, and can be ignored by most users. For all models
except the `"cox"`

, this refers to computing `nfolds`

separate statistics, and then using their mean and estimated
standard error to describe the CV curve. If `grouped=FALSE`

,
an error matrix is built up at the observation level from the predictions
from the `nfold`

fits, and then summarized (does not apply to
`type.measure="auc"`

). For the `"cox"`

family,
`grouped=TRUE`

obtains the CV partial likelihood for the Kth
fold by *subtraction*; by subtracting the log partial
likelihood evaluated on the full dataset from that evaluated on
the on the (K-1)/K dataset. This makes more efficient use of risk
sets. With `grouped=FALSE`

the log partial likelihood is
computed only on the Kth fold

keep

If `keep=TRUE`

, a *prevalidated* array is
returned containing fitted values for each observation and each
value of `lambda`

. This means these fits are computed with
this observation and the rest of its fold omitted. The
`folid`

vector is also returned. Default is keep=FALSE

parallel

If `TRUE`

, use parallel `foreach`

to fit each fold.
Must register parallel before hand, such as `doMC`

or others.
See the example below.

…

Other arguments that can be passed to `glmnet`

an object of class `"cv.glmnet"`

is returned, which is a
list with the ingredients of the cross-validation fit.

the values of `lambda`

used in the fits.

The mean cross-validated error - a vector of length
`length(lambda)`

.

estimate of standard error of `cvm`

.

upper curve = `cvm+cvsd`

.

lower curve = `cvm-cvsd`

.

number of non-zero coefficients at each `lambda`

.

a text string indicating type of measure (for plotting purposes).

a fitted glmnet object for the full data.

value of `lambda`

that gives minimum
`cvm`

.

largest value of `lambda`

such that error is
within 1 standard error of the minimum.

if `keep=TRUE`

, this is the array of
prevalidated fits. Some entries can be `NA`

, if that and
subsequent values of `lambda`

are not reached for that fold

if `keep=TRUE`

, the fold assignments used

The function runs `glmnet`

`nfolds`

+1 times; the
first to get the `lambda`

sequence, and then the remainder to
compute the fit with each of the folds omitted. The error is
accumulated, and the average error and standard deviation over the
folds is computed.
Note that `cv.glmnet`

does NOT search for
values for `alpha`

. A specific value should be supplied, else
`alpha=1`

is assumed by default. If users would like to
cross-validate `alpha`

as well, they should call `cv.glmnet`

with a pre-computed vector `foldid`

, and then use this same fold vector
in separate calls to `cv.glmnet`

with different values of
`alpha`

. Note also that the results of `cv.glmnet`

are
random, since the folds are selected at random. Users can reduce this
randomness by running `cv.glmnet`

many times, and averaging the
error curves.

Friedman, J., Hastie, T. and Tibshirani, R. (2008)
*Regularization Paths for Generalized Linear Models via Coordinate
Descent*, http://www.stanford.edu/~hastie/Papers/glmnet.pdf
*Journal of Statistical Software, Vol. 33(1), 1-22 Feb 2010*
http://www.jstatsoft.org/v33/i01/
Simon, N., Friedman, J., Hastie, T., Tibshirani, R. (2011)
*Regularization Paths for Cox's Proportional Hazards Model via
Coordinate Descent, Journal of Statistical Software, Vol. 39(5)
1-13*
http://www.jstatsoft.org/v39/i05/

`glmnet`

and `plot`

, `predict`

, and `coef`

methods for `"cv.glmnet"`

object.

# NOT RUN { set.seed(1010) n=1000;p=100 nzc=trunc(p/10) x=matrix(rnorm(n*p),n,p) beta=rnorm(nzc) fx= x[,seq(nzc)] %*% beta eps=rnorm(n)*5 y=drop(fx+eps) px=exp(fx) px=px/(1+px) ly=rbinom(n=length(px),prob=px,size=1) set.seed(1011) cvob1=cv.glmnet(x,y) plot(cvob1) coef(cvob1) predict(cvob1,newx=x[1:5,], s="lambda.min") title("Gaussian Family",line=2.5) set.seed(1011) cvob1a=cv.glmnet(x,y,type.measure="mae") plot(cvob1a) title("Gaussian Family",line=2.5) set.seed(1011) par(mfrow=c(2,2),mar=c(4.5,4.5,4,1)) cvob2=cv.glmnet(x,ly,family="binomial") plot(cvob2) title("Binomial Family",line=2.5) frame() set.seed(1011) cvob3=cv.glmnet(x,ly,family="binomial",type.measure="class") plot(cvob3) title("Binomial Family",line=2.5) set.seed(1011) cvob3a=cv.glmnet(x,ly,family="binomial",type.measure="auc") plot(cvob3a) title("Binomial Family",line=2.5) set.seed(1011) mu=exp(fx/10) y=rpois(n,mu) cvob4=cv.glmnet(x,y,family="poisson") plot(cvob4) title("Poisson Family",line=2.5) # } # NOT RUN { # Multinomial n=500;p=30 nzc=trunc(p/10) x=matrix(rnorm(n*p),n,p) beta3=matrix(rnorm(30),10,3) beta3=rbind(beta3,matrix(0,p-10,3)) f3=x%*% beta3 p3=exp(f3) p3=p3/apply(p3,1,sum) g3=rmult(p3) set.seed(10101) cvfit=cv.glmnet(x,g3,family="multinomial") plot(cvfit) title("Multinomial Family",line=2.5) # Cox beta=rnorm(nzc) fx=x[,seq(nzc)]%*%beta/3 hx=exp(fx) ty=rexp(n,hx) tcens=rbinom(n=n,prob=.3,size=1)# censoring indicator y=cbind(time=ty,status=1-tcens) # y=Surv(ty,1-tcens) with library(survival) foldid=sample(rep(seq(10),length=n)) fit1_cv=cv.glmnet(x,y,family="cox",foldid=foldid) plot(fit1_cv) title("Cox Family",line=2.5) # Parallel require(doMC) registerDoMC(cores=4) x = matrix(rnorm(1e5 * 100), 1e5, 100) y = rnorm(1e5) system.time(cv.glmnet(x,y)) system.time(cv.glmnet(x,y,parallel=TRUE)) # }