
Implements k-fold cross-validation for grpnet
to find the regularization parameters that minimize the prediction error (deviance, mean squared error, mean absolute error, or misclassification rate).
cv.grpnet(x, ...)# S3 method for default
cv.grpnet(x,
y,
group,
weights = NULL,
offset = NULL,
alpha = c(0.01, 0.25, 0.5, 0.75, 1),
gamma = c(3, 4, 5),
type.measure = NULL,
nfolds = 10,
foldid = NULL,
same.lambda = FALSE,
parallel = FALSE,
cluster = NULL,
verbose = interactive(),
adaptive = FALSE,
power = 1,
...)
# S3 method for formula
cv.grpnet(formula,
data,
use.rk = TRUE,
weights = NULL,
offset = NULL,
alpha = c(0.01, 0.25, 0.5, 0.75, 1),
gamma = c(3, 4, 5),
type.measure = NULL,
nfolds = 10,
foldid = NULL,
same.lambda = FALSE,
parallel = FALSE,
cluster = NULL,
verbose = interactive(),
adaptive = FALSE,
power = 1,
...)
regularization parameter sequence for the full data
mean cross-validation error for each lambda
estimated standard error of cvm
upper curve: cvm + cvsd
lower curve: cvm - cvsd
number of non-zero groups for each lambda
fitted grpnet object for the full data
value of lambda
that minimizes cvm
largest lambda
such that cvm
is within one cvsd
from the minimum (see Note)
two-element vector giving the indices of lambda.min
and lambda.1se
in the lambda
vector, i.e., c(minid, se1id)
as defined in the Note
loss function for cross-validation (used for plot label)
matched call
runtime in seconds to perform k-fold CV tuning
data frame containing the tuning results, i.e., min(cvm) for each combination of alpha
and/or gamma
Model (design) matrix of dimension nobs
by nvars
(
Response vector of length grpnet
), and (iii) not permitted for other families.
Group label vector (factor, character, or integer) of length
Model formula: a symbolic description of the model to be fitted. Uses the same syntax as lm
and glm
.
Optional data frame containing the variables referenced in formula
.
If TRUE
(default), the rk.model.matrix
function is used to build the model matrix. Otherwise, the model.matrix
function is used to build the model matrix. Additional arguments to the rk.model.matrix
function can be passed via the ...
argument.
Optional vector of length
Optional vector of length
Scalar or vector specifying the elastic net tuning parameter alpha
is a vector (default), then (a) the same foldid
is used to compute the cross-validation error for each
Scalar or vector specifying the penalty hyperparameter gamma
is a vector (default), then (a) the same foldid
is used to compute the cross-validation error for each
Loss function for cross-validation. Options include: "deviance"
for model deviance, "mse"
for mean squared error, "mae"
for mean absolute error, or "class"
for classification error. Note that "class"
is only available for binomial and multinomial families. The default is classification error (for binomial and multinomial) or mean absolute error (others).
Number of folds for cross-validation.
Optional vector of length nfolds
argument is defined as nfolds = nlevels(foldid)
.
Logical specfying if the same FALSE
(default), the TRUE
, the
Logical specifying if sequential computing (default) or parallel computing should be used. If TRUE
, the fitting for each fold is parallelized.
Optional cluster to use for parallel computing. If parallel = TRUE
and cluster = NULL
, then the cluster is defined cluster = makeCluster(2L)
, which uses two cores. Recommended usage: cluster = makeCluster(detectCores())
Logical indicating if the fitting progress should be printed. Defaults to TRUE
in interactive sessions and FALSE
otherwise.
Logical indicating if the adaptive group elastic net should be used (see Note).
If adaptive = TRUE
, then the adaptive penalty weights are defined by dividing the original penalty weights by tapply(coef, group, norm, type = "F")^power
.
Optional additional arguments for grpnet
(e.g., standardize
, penalty.factor
, etc.)
Nathaniel E. Helwig <helwig@umn.edu>
This function calls the grpnet
function nfolds+1
times: once on the full dataset to obtain the lambda
sequence, and once holding out each fold's data to evaluate the prediction error. The syntax of (the default S3 method for) this function closely mimics that of the cv.glmnet
function in the glmnet package (Friedman, Hastie, & Tibshirani, 2010).
Let
The cross-validation error for the type.measure
. For example, the "mse"
loss function is defined as
The mean cross-validation error cvm
is defined as
cvsd
is defined as
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1-22. tools:::Rd_expr_doi("10.18637/jss.v033.i01")
Helwig, N. E. (2025). Versatile descent algorithms for group regularization and variable selection in generalized linear models. Journal of Computational and Graphical Statistics, 34(1), 239-252. tools:::Rd_expr_doi("10.1080/10618600.2024.2362232")
plot.cv.grpnet
for plotting the cross-validation error curve
predict.cv.grpnet
for predicting from cv.grpnet
objects
grpnet
for fitting group elastic net regularization paths
# \donttest{
######***###### family = "gaussian" ######***######
# load data
data(auto)
# 10-fold cv (formula method, response = mpg)
set.seed(1)
mod <- cv.grpnet(mpg ~ ., data = auto)
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "multigaussian" ######***######
# load data
data(auto)
# 10-fold cv (formula method, response = (mpg, displacement))
y <- as.matrix(auto[,c(1,3)])
set.seed(1)
mod <- cv.grpnet(y ~ ., data = auto[,-c(1,3)], family = "multigaussian",
standardize.response = TRUE)
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "svm1" ######***######
# load data
data(auto)
# redefine origin (Domestic vs Foreign)
auto$origin <- ifelse(auto$origin == "American", "Domestic", "Foreign")
# 10-fold cv (default method, response = origin with 2 levels)
set.seed(1)
mod <- cv.grpnet(origin ~ ., data = auto, family = "svm1")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "svm2" ######***######
# load data
data(auto)
# redefine origin (Domestic vs Foreign)
auto$origin <- ifelse(auto$origin == "American", "Domestic", "Foreign")
# 10-fold cv (default method, response = origin with 2 levels)
set.seed(1)
mod <- cv.grpnet(origin ~ ., data = auto, family = "svm2")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "logit" ######***######
# load data
data(auto)
# redefine origin (Domestic vs Foreign)
auto$origin <- ifelse(auto$origin == "American", "Domestic", "Foreign")
# 10-fold cv (default method, response = origin with 2 levels)
set.seed(1)
mod <- cv.grpnet(origin ~ ., data = auto, family = "logit")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "binomial" ######***######
# load data
data(auto)
# redefine origin (Domestic vs Foreign)
auto$origin <- ifelse(auto$origin == "American", "Domestic", "Foreign")
# 10-fold cv (default method, response = origin with 2 levels)
set.seed(1)
mod <- cv.grpnet(origin ~ ., data = auto, family = "binomial")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "multinomial" ######***######
# load data
data(auto)
# 10-fold cv (formula method, response = origin with 3 levels)
set.seed(1)
mod <- cv.grpnet(origin ~ ., data = auto, family = "multinomial")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "poisson" ######***######
# load data
data(auto)
# 10-fold cv (formula method, response = horsepower)
set.seed(1)
mod <- cv.grpnet(horsepower ~ ., data = auto, family = "poisson")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "negative.binomial" ######***######
# load data
data(auto)
# 10-fold cv (formula method, response = horsepower)
set.seed(1)
mod <- cv.grpnet(horsepower ~ ., data = auto, family = "negative.binomial")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "Gamma" ######***######
# load data
data(auto)
# 10-fold cv (formula method, response = origin)
set.seed(1)
mod <- cv.grpnet(mpg ~ ., data = auto, family = "Gamma")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "inverse.gaussian" ######***######
# load data
data(auto)
# 10-fold cv (formula method, response = origin)
set.seed(1)
mod <- cv.grpnet(mpg ~ ., data = auto, family = "inverse.gaussian")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
# }
Run the code above in your browser using DataLab