grpnet: fit a GLM with group lasso or group elastic-net regularization

Description

Computes a group elastic-net regularization path for a variety of GLM and other families, including the Cox model. This function extends the abilities of the glmnet package to allow for grouped regularization. The code is very efficient (core routines are written in C++), and allows for specialized matrix classes.

Usage

grpnet(
  X,
  glm,
  constraints = NULL,
  groups = NULL,
  alpha = 1,
  penalty = NULL,
  offsets = NULL,
  lambda = NULL,
  standardize = TRUE,
  irls_max_iters = as.integer(10000),
  irls_tol = 1e-07,
  max_iters = as.integer(1e+05),
  tol = 1e-07,
  adev_tol = 0.9,
  ddev_tol = 0,
  newton_tol = 1e-12,
  newton_max_iters = 1000,
  n_threads = 1,
  early_exit = TRUE,
  intercept = TRUE,
  screen_rule = c("pivot", "strong"),
  min_ratio = 0.01,
  lmda_path_size = 100,
  max_screen_size = NULL,
  max_active_size = NULL,
  pivot_subset_ratio = 0.1,
  pivot_subset_min = 1,
  pivot_slack_ratio = 1.25,
  check_state = FALSE,
  progress_bar = FALSE,
  warm_start = NULL
)

Value

A list of class "grpnet". This has a main component called state which represents the fitted path, and a few extra useful components such as the call, the family name, groups and group_sizes. Users are encouraged to use methods like predict(), coef(), print(), plot() etc to examine the object.

Arguments

X: Feature matrix. Either a regular R matrix, or else an adelie custom matrix class, or a concatination of such.
glm: GLM family/response object. This is an expression that represents the family, the reponse and other arguments such as weights, if present. The choices are glm.gaussian(), glm.binomial(), glm.poisson(), glm.multinomial(), glm.cox(), glm.multinomial(), and glm.multigaussian(). This is a required argument, and there is no default. In the simple example below, we use glm.gaussian(y).
constraints: Group-wise constraints on the parameters, supplied as a list with an element for each group. Default is NULL, which means no constraints. List elements can be NULL as well. Currently only 'box constraints' are supported, which means upper and lower limits. The function constraint.box() must be used to set the constraints for each group that has constraints. Details are given in the documentation for constraint.box.
groups: This is an ordered vector of integers that represents the groupings, with each entry indicating where a group begins. The entries refer to column numbers in the feature matrix, and hence the memebers of a group have to be contiguous. If there are p features, the default is 1:p (no groups; i.e. p groups each of of size 1). So the length of groups is the number of groups. (Note that in the state output of grpnet this vector might be shifted to start from 0, since internally adelie uses zero-based indexing.)
alpha: The elasticnet mixing parameter, with $0\le\alpha\le 1$. The penalty is defined as $$(1-\alpha)/2\sum_j||\beta_j||_2^2+\alpha\sum_j||\beta_j||_2,$$ where thte sum is over groups. alpha=1 is pure group lasso penalty, and alpha=0 the pure ridge penalty.
penalty: Separate penalty factors can be applied to each group of coefficients. This is a number that multiplies lambda to allow differential shrinkage for groups. Can be 0 for some groups, which implies no shrinkage, and that group is always included in the model. Default is square-root of group sizes for each group.
offsets: Offsets, default is NULL. If present, this is a fixed vector or matrix corresponding to the shape of the natural parameter, and is added to the fit.
lambda: A user supplied lambda sequence. Typical usage is to have the program compute its own lambda sequence based on lmda_path_size and min_ratio. This is returned with the fit.
standardize: If TRUE (the default), the columns of X are standardized before the fit is computed. This is good practice if the features are on different scales, because it has an impact on the penalty. The regularization path is computed using the standardized features, and the standardization information is saved on the object for making future predictions. The different matrix classes have their own methods for standardization. For example, for a sparse matrix the standardization information will be computed, but not actually applied (eg centering would destroy the sparsity). Rather, the methods for matrix multiply will be aware, and incorporate the standardization information.
irls_max_iters: Maximum number of IRLS iterations, default is 1e4.
irls_tol: IRLS convergence tolerance, default is 1e-7.
max_iters: Maximum total number of coordinate descent iterations, default is 1e5.
tol: Coordinate descent convergence tolerance, default 1e-7.
adev_tol: Fraction deviance explained tolerance, default 0.9. This can be seen as a limit on overfitting the training data.
ddev_tol: Difference in fraction deviance explained tolerance, default 0. If a step in the path changes the deviance by this amount or less, the algorithm truncates the path.
newton_tol: Convergence tolerance for the BCD update, default 1e-12. This parameter controls the iterations in each block-coordinate step to establish the block solution.
newton_max_iters: Maximum number of iterations for the BCD update, default 1000.
n_threads: Number of threads, default 1.
early_exit: TRUE if the function should be allowed to exit early.
intercept: Default TRUE to include an unpenalized intercept.
screen_rule: Screen rule, with default "pivot". Other option is "strong". (an empirical improvement over "strong", the other option.)
min_ratio: Ratio between smallest and largest value of lambda. Default is 1e-2.
lmda_path_size: Number of values for lambda, if generated automatically. Default is 100.
max_screen_size: Maximum number of screen groups. Default is NULL.
max_active_size: Maximum number of active groups. Default is NULL.
pivot_subset_ratio: Subset ratio of pivot rule. Default is 0.1. Users not expected to fiddle with this.
pivot_subset_min: Minimum subset of pivot rule. Defaults is 1. Users not expected to fiddle with this.
pivot_slack_ratio: Slack ratio of pivot rule, default is 1.25. Users not expected to fiddle with this. See reference for details.
check_state: Check state. Internal parameter, with default FALSE.
progress_bar: Progress bar. Default is FALSE.
warm_start: Warm start (default is NULL). Internal parameter.

Author

James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie hastie@stanford.edu

References

Yang, James and Hastie, Trevor. (2024) A Fast and Scalable Pathwise-Solver for Group Lasso and Elastic Net Penalized Regression via Block-Coordinate Descent. arXiv tools:::Rd_expr_doi("10.48550/arXiv.2405.08631").
Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization Paths for Generalized Linear Models via Coordinate Descent (2010), Journal of Statistical Software, Vol. 33(1), 1-22, tools:::Rd_expr_doi("10.18637/jss.v033.i01").
Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2011) Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent, Journal of Statistical Software, Vol. 39(5), 1-13, tools:::Rd_expr_doi("10.18637/jss.v039.i05").
Tibshirani,Robert, Bien, J., Friedman, J., Hastie, T.,Simon, N., Taylor, J. and Tibshirani, Ryan. (2012) Strong Rules for Discarding Predictors in Lasso-type Problems, JRSSB, Vol. 74(2), 245-266, https://arxiv.org/abs/1011.2234.

Examples

Run this code

set.seed(0)
n <- 100
p <- 200
X <- matrix(rnorm(n * p), n, p)
y <- X[,1] * rnorm(1) + rnorm(n)
## Here we create 60 groups randomly. Groups need to be contiguous, and the `groups` variable
## indicates the beginning position of each group.
groups <- c(1, sample(2:199, 60, replace = FALSE))
groups <- sort(groups)
print(groups)
fit <- grpnet(X, glm.gaussian(y), groups = groups)
print(fit)
plot(fit)
coef(fit)
cvfit  <- cv.grpnet(X, glm.gaussian(y), groups = groups)
print(cvfit)
plot(cvfit)
predict(cvfit,newx=X[1:5,], lambda="lambda.min")

Run the code above in your browser using DataLab