Does k-fold cross-validation for grpnet
cv.grpnet(
X,
glm,
n_folds = 10,
foldid = NULL,
min_ratio = 0.01,
lmda_path_size = 100,
offsets = NULL,
progress_bar = FALSE,
n_threads = 1,
...
)
an object of class "cv.grpnet"
is returned, which is a list
with the ingredients of the cross-validation fit.
the values of lambda
used in the
fits.
The mean cross-validated deviance - a vector of length length(lambda)
.
estimate of standard error of cvm
.
upper curve = cvm+cvsd
.
lower curve = cvm-cvsd
.
number of non-zero coefficients at each lambda
.
a text string indicating type of measure (for plotting purposes).
Currently this is "deviance"
a fitted grpnet object for the full data.
value of lambda
that gives minimum cvm
.
largest value of lambda
such that
mean deviance is within 1 standard error of the minimum.
a one column matrix with the indices of lambda.min
and lambda.1se
in the sequence of coefficients, fits etc.
Feature matrix. Either a regualr R matrix, or else an
adelie
custom matrix class, or a concatination of such.
GLM family/response object. This is an expression that
represents the family, the reponse and other arguments such as
weights, if present. The choices are glm.gaussian()
,
glm.binomial()
, glm.poisson()
,
glm.multinomial()
, glm.cox()
, glm.multinomial()
,
and glm.multigaussian()
. This is a required argument, and
there is no default. In the simple example below, we use glm.gaussian(y)
.
(default 10). Although n_folds
can be
as large as the sample size (leave-one-out CV), it is not recommended for
large datasets. Smallest value allowable is n_folds=3
.
An optional vector of values between 1 and n_folds
identifying what fold each observation is in. If supplied, n_folds
can
be missing.
Ratio between smallest and largest value of lambda. Default is 1e-2.
Number of values for lambda
, if generated automatically.
Default is 100.
Offsets, default is NULL
. If present, this is
a fixed vector or matrix corresponding to the shape of the natural
parameter, and is added to the fit.
Progress bar. Default is FALSE
.
Number of threads, default 1
.
Other arguments that can be passed to grpnet
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie
hastie@stanford.edu
The function runs grpnet
n_folds
+1 times; the first to get the
lambda
sequence, and then the remainder to compute the fit with each
of the folds omitted. The out-of-fold deviance is accumulated, and the average deviance and
standard deviation over the folds is computed. Note that cv.grpnet
does NOT search for values for alpha
. A specific value should be
supplied, else alpha = 1
is assumed by default. If users would like to
cross-validate alpha
as well, they should call cv.grpnet
with
a pre-computed vector foldid
, and then use this same foldid
vector in
separate calls to cv.grpnet
with different values of alpha
.
Note also that the results of cv.grpnet
are random, since the folds
are selected at random (unless supplied via foldid
).
Users can reduce this randomness by running
cv.grpnet
many times, and averaging the error curves.
Yang, James and Hastie, Trevor. (2024) A Fast and Scalable Pathwise-Solver for Group Lasso
and Elastic Net Penalized Regression via Block-Coordinate Descent. arXiv tools:::Rd_expr_doi("10.48550/arXiv.2405.08631").
Friedman, J., Hastie, T. and Tibshirani, R. (2008)
Regularization Paths for Generalized Linear Models via Coordinate
Descent (2010), Journal of Statistical Software, Vol. 33(1), 1-22,
tools:::Rd_expr_doi("10.18637/jss.v033.i01").
Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2011)
Regularization Paths for Cox's Proportional
Hazards Model via Coordinate Descent, Journal of Statistical Software, Vol.
39(5), 1-13,
tools:::Rd_expr_doi("10.18637/jss.v039.i05").
Tibshirani,Robert, Bien, J., Friedman, J., Hastie, T.,Simon, N.,Taylor, J. and
Tibshirani, Ryan. (2012) Strong Rules for Discarding Predictors in
Lasso-type Problems, JRSSB, Vol. 74(2), 245-266,
https://arxiv.org/abs/1011.2234.
print.cv.grpnet
, predict.cv.grpnet
, coef.cv.grpnet
, plot.cv.grpnet
.
set.seed(0)
n <- 100
p <- 200
X <- matrix(rnorm(n * p), n, p)
y <- X[,1:25] %*% rnorm(25)/4 + rnorm(n)
groups <- c(1, sample(2:199, 60, replace = FALSE))
groups <- sort(groups)
cvfit <- cv.grpnet(X, glm.gaussian(y), groups = groups)
print(cvfit)
plot(cvfit)
predict(cvfit, newx = X[1:5,])
predict(cvfit, type = "nonzero")
Run the code above in your browser using DataLab