Computes a group elastic-net regularization path for a variety of
GLM and other families, including the Cox model. This function
extends the abilities of the glmnet
package to allow for
grouped regularization. The code is very efficient (core routines
are written in C++), and allows for specialized matrix
classes.
grpnet(
X,
glm,
constraints = NULL,
groups = NULL,
alpha = 1,
penalty = NULL,
offsets = NULL,
lambda = NULL,
standardize = TRUE,
irls_max_iters = as.integer(10000),
irls_tol = 1e-07,
max_iters = as.integer(1e+05),
tol = 1e-07,
adev_tol = 0.9,
ddev_tol = 0,
newton_tol = 1e-12,
newton_max_iters = 1000,
n_threads = 1,
early_exit = TRUE,
intercept = TRUE,
screen_rule = c("pivot", "strong"),
min_ratio = 0.01,
lmda_path_size = 100,
max_screen_size = NULL,
max_active_size = NULL,
pivot_subset_ratio = 0.1,
pivot_subset_min = 1,
pivot_slack_ratio = 1.25,
check_state = FALSE,
progress_bar = FALSE,
warm_start = NULL
)
A list of class "grpnet"
. This has a main component called state
which
represents the fitted path, and a few extra
useful components such as the call
, the family
name, groups
and group_sizes
.
Users are encouraged to use methods like predict()
, coef()
, print()
, plot()
etc to examine the object.
Feature matrix. Either a regular R matrix, or else an
adelie
custom matrix class, or a concatination of such.
GLM family/response object. This is an expression that
represents the family, the reponse and other arguments such as
weights, if present. The choices are glm.gaussian()
,
glm.binomial()
, glm.poisson()
,
glm.multinomial()
, glm.cox()
, glm.multinomial()
,
and glm.multigaussian()
. This is a required argument, and
there is no default. In the simple example below, we use glm.gaussian(y)
.
Group-wise constraints on the parameters, supplied as a list with an element for each group. Default is NULL
, which means no constraints. List elements can be NULL
as well. Currently only 'box constraints' are supported, which means upper and lower limits. The function constraint.box()
must be used to set the constraints for each group that has constraints. Details are given in the documentation for constraint.box
.
This is an ordered vector of integers that represents the groupings,
with each entry indicating where a group begins. The entries refer to column numbers
in the feature matrix, and hence the memebers of a group have to be contiguous.
If there are p
features, the default is 1:p
(no groups; i.e. p
groups each of of size 1). So the length of groups
is the number of groups.
(Note that in the state
output of grpnet
this vector might be shifted to start from 0,
since internally adelie
uses zero-based indexing.)
The elasticnet mixing parameter, with \(0\le\alpha\le 1\).
The penalty is defined as
$$(1-\alpha)/2\sum_j||\beta_j||_2^2+\alpha\sum_j||\beta_j||_2,$$ where thte sum is over groups.
alpha=1
is pure group
lasso penalty, and alpha=0
the pure ridge penalty.
Separate penalty factors can be applied to each group of coefficients.
This is a number that multiplies lambda
to allow
differential shrinkage for groups. Can be 0 for some groups, which implies no
shrinkage, and that group is always included in the model.
Default is square-root of group sizes for each group.
Offsets, default is NULL
. If present, this is
a fixed vector or matrix corresponding to the shape of the natural
parameter, and is added to the fit.
A user supplied lambda
sequence. Typical usage is to
have the program compute its own lambda
sequence based on
lmda_path_size
and min_ratio
. This is returned with the fit.
If TRUE
(the default), the columns of X
are standardized before the
fit is computed. This is good practice if the features are on different scales, because it has an impact on
the penalty. The regularization path is computed using the standardized features, and the
standardization information is saved on the object for making future predictions. The different matrix classes have their own methods for standardization. For example, for a sparse matrix the standardization information will be computed, but not actually applied (eg centering would destroy the sparsity). Rather, the methods for matrix multiply will be aware, and incorporate the standardization information.
Maximum number of IRLS iterations, default is
1e4
.
IRLS convergence tolerance, default is 1e-7
.
Maximum total number of coordinate descent
iterations, default is 1e5
.
Coordinate descent convergence tolerance, default 1e-7
.
Fraction deviance explained tolerance, default
0.9
. This can be seen as a limit on overfitting the
training data.
Difference in fraction deviance explained
tolerance, default 0
. If a step in the path changes the
deviance by this amount or less, the algorithm truncates the
path.
Convergence tolerance for the BCD update, default
1e-12
. This parameter controls the iterations in each
block-coordinate step to establish the block solution.
Maximum number of iterations for the BCD
update, default 1000
.
Number of threads, default 1
.
TRUE
if the function should be allowed to exit
early.
Default TRUE
to include an unpenalized
intercept.
Screen rule, with default "pivot"
. Other option is "strong"
.
(an empirical improvement over "strong"
, the other option.)
Ratio between smallest and largest value of lambda. Default is 1e-2.
Number of values for lambda
, if generated automatically.
Default is 100.
Maximum number of screen groups. Default is NULL
.
Maximum number of active groups. Default is NULL
.
Subset ratio of pivot rule. Default is 0.1
. Users not expected to fiddle with this.
Minimum subset of pivot rule. Defaults is 1
. Users not expected to fiddle with this.
Slack ratio of pivot rule, default is 1.25
. Users not expected to fiddle with this.
See reference for details.
Check state. Internal parameter, with default FALSE
.
Progress bar. Default is FALSE
.
Warm start (default is NULL
). Internal parameter.
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie
hastie@stanford.edu
Yang, James and Hastie, Trevor. (2024) A Fast and Scalable Pathwise-Solver for Group Lasso
and Elastic Net Penalized Regression via Block-Coordinate Descent. arXiv tools:::Rd_expr_doi("10.48550/arXiv.2405.08631").
Friedman, J., Hastie, T. and Tibshirani, R. (2008)
Regularization Paths for Generalized Linear Models via Coordinate
Descent (2010), Journal of Statistical Software, Vol. 33(1), 1-22,
tools:::Rd_expr_doi("10.18637/jss.v033.i01").
Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2011)
Regularization Paths for Cox's Proportional
Hazards Model via Coordinate Descent, Journal of Statistical Software, Vol.
39(5), 1-13,
tools:::Rd_expr_doi("10.18637/jss.v039.i05").
Tibshirani,Robert, Bien, J., Friedman, J., Hastie, T.,Simon, N., Taylor, J. and
Tibshirani, Ryan. (2012) Strong Rules for Discarding Predictors in
Lasso-type Problems, JRSSB, Vol. 74(2), 245-266,
https://arxiv.org/abs/1011.2234.
cv.grpnet
, predict.grpnet
, coef.grpnet
, plot.grpnet
, print.grpnet
.
set.seed(0)
n <- 100
p <- 200
X <- matrix(rnorm(n * p), n, p)
y <- X[,1] * rnorm(1) + rnorm(n)
## Here we create 60 groups randomly. Groups need to be contiguous, and the `groups` variable
## indicates the beginning position of each group.
groups <- c(1, sample(2:199, 60, replace = FALSE))
groups <- sort(groups)
print(groups)
fit <- grpnet(X, glm.gaussian(y), groups = groups)
print(fit)
plot(fit)
coef(fit)
cvfit <- cv.grpnet(X, glm.gaussian(y), groups = groups)
print(cvfit)
plot(cvfit)
predict(cvfit,newx=X[1:5,], lambda="lambda.min")
Run the code above in your browser using DataLab