cv.grpreg: Cross-validation for grpreg

Description

Performs k-fold cross validation for penalized regression models with grouped covariates over a grid of values for the regularization parameter lambda.

Usage

cv.grpreg(X, y, group=1:ncol(X), ..., nfolds=10, seed, cv.ind,
returnY=FALSE, trace=FALSE)

Arguments

The design matrix, as in grpreg.

The response vector (or matrix), as in grpreg.

group

The grouping vector, as in grpreg.

...

Additional arguments to grpreg.

nfolds

The number of cross-validation folds. Default is 10.

seed

You may set the seed of the random number generator in order to obtain reproducible results.

cv.ind

Which fold each observation belongs to. By default the observations are randomly assigned by cv.grpreg.

returnY

Should cv.grpreg return the fitted values from the cross-validation folds? Default is FALSE; if TRUE, this will return a matrix in which the element for row i, column j is the fitted value for observation i from the fold in which observation i was excluded from the fit, at the jth value of lambda.

trace

If set to TRUE, cv.grpreg will inform the user of its progress by announcing the beginning of each CV fold. Default is FALSE.

Value

An object with S3 class "cv.grpreg" containing:

cve

The error for each value of lambda, averaged across the cross-validation folds.

cvse

The estimated standard error associated with each value of for cve.

lambda

The sequence of regularization parameter values along which the cross-validation error was calculated.

fit

The fitted grpreg object for the whole data.

min

The index of lambda corresponding to lambda.min.

lambda.min

The value of lambda with the minimum cross-validation error.

null.dev

The deviance for the intercept-only model.

If family="binomial", the cross-validation prediction error for each value of lambda.

Details

The function calls grpreg nfolds times, each time leaving out 1/nfolds of the data. The cross-validation error is based on the residual sum of squares when family="gaussian" and the deviance when family="binomial" or family="poisson".

For Gaussian and Poisson responses, the folds are chosen according to simple random sampling. For binomial responses, the numbers for each outcome class are balanced across the folds; i.e., the number of outcomes in which y is equal to 1 is the same for each fold, or possibly off by 1 if the numbers do not divide evenly.

As in grpreg, seemingly unrelated regressions/multitask learning can be carried out by setting y to be a matrix, in which case groups are set up automatically (see grpreg for details), and cross-validation is carried out with respect to rows of y. As mentioned in the details there, it is recommended to standardize the responses prior to fitting.

Examples

Run this code

data(Birthwt)
X <- Birthwt$X
y <- Birthwt$bwt
group <- Birthwt$group

cvfit <- cv.grpreg(X, y, group)
plot(cvfit)
summary(cvfit)
coef(cvfit) ## Beta at minimum CVE

cvfit <- cv.grpreg(X, y, group, penalty="gel")
plot(cvfit)
summary(cvfit)

Run the code above in your browser using DataLab