cv_glmnet: Fitting of regularized linear models

Description

Convenience function for fitting multivariate linear models with multivariate response by relying on cv.glmnet from the glmnet-package. The function fits the multivariate linear model $$\mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{E},$$ where $\mathbf{X}$ is a $p$-dimensional vector, $\mathbf{Y}$ and $\mathbf{E}$ are two $q$-dimensional vectors, and $\mathbf{B}$ is a $p\times q$ matrix.

If $\mathbf{X}$ and $\mathbf{Y}$ are centered (i.e., have zero-mean columns), the function estimates $\mathbf{B}$ by solving, for the sample $(\mathbf{X}_1, \mathbf{Y}_1), \ldots, (\mathbf{X}_n, \mathbf{Y}_n)$, the elastic-net optimization problem $$ \min_{\mathbf{B}\in R^{q \times p}} \frac{1}{2n}\sum_{i=1}^n \|\mathbf{Y}_i-\mathbf{X}_i\mathbf{B}\|^2 + \lambda\left[(1-\alpha)\|\mathbf{B}\|_\mathrm{F}^2 / 2 + \alpha \sum_{j=1}^p \|\mathbf{B}_j\|_2\right], $$ where $\|\mathbf{B}\|_\mathrm{F}$ stands for the Frobenious norm of the matrix $\mathbf{B}$ and $\|\mathbf{B}_j\|_2$ for the Euclidean norm of the $j$-th row of $\mathbf{B}$. The choice $\alpha = 0$ in the elastic-net penalization corresponds to ridge regression, whereas $\alpha = 1$ yields a lasso-type estimator. The unpenalized least-squares estimator is obtained with $\lambda = 0$.

Usage

cv_glmnet(x, y, alpha = c("lasso", "ridge")[1], lambda = NULL,
  intercept = TRUE, thresh = 1e-10, cv_1se = TRUE, cv_nlambda = 50,
  cv_folds = NULL, cv_grouped = FALSE, cv_lambda = 10^seq(2, -3,
  length.out = cv_nlambda), cv_second = TRUE, cv_tol_second = 0.025,
  cv_log10_exp = c(-0.5, 3), cv_thresh = 1e-05, cv_parallel = FALSE,
  cv_verbose = FALSE, ...)

Value

A list with the following entries:

beta_hat: the estimated $\mathbf{B}$, a matrix of size c(p, q).
lambda: the optimal $\lambda$ obtained by cross-validation and according to cv_1se.
cv: if lambda = NULL, the result of the cross-validation search for the optimal $\lambda$. Otherwise, NULL.
fit: the glmnet fit, computed with thresh threshold and with an automatically chosen $\lambda$ sequence.

Arguments

x: input matrix of size c(n, p), or a vector of length n.
y: response matrix of size c(n, q), or a vector of length n.
alpha: elastic net mixing argument in glmnet, with $0 \le \alpha \le 1$. Alternatively, a character string indicating whether the "ridge" ($\alpha = 0$) or "lasso" ($\alpha = 1$) fit is to be performed.
lambda: scalar giving the regularization parameter $\lambda$. If NULL (default), the optimal $\lambda$ is searched by cross-validation. If lambda is provided, then cross-validation is skipped and the fit is performed for the given lambda.
intercept: flag passed to the intercept argument in glmnet to indicate if the intercept should be fitted (default; does not assume that the data is centered) or set to zero (the optimization problem above is solved as-is). Defaults to TRUE.
thresh: convergence threshold of the coordinate descending algorithm, passed to the thresh argument in glmnet. Defaults to 1e-10.
cv_1se: shall the optimal lambda be the lambda.1se, as returned by cv.glmnet? This favors sparser fits. If FALSE, then the optimal lambda is lambda.min, the minimizer of the cross-validation loss. Defaults to TRUE.
cv_nlambda: the length of the sequence of $\lambda$ values, passed to the nlambda argument in cv.glmnet for the cross-validation search. Defaults to 50.
cv_folds: number of folds to perform cross-validation. If NULL (the default), then
cv_folds <- n internally, that is, leave-one-out cross-validation is performed. Passed to the nfolds argument in cv.glmnet.
cv_grouped: passed to the grouped argument in cv.glmnet. Defaults to FALSE.
cv_lambda: passed to the lambda argument in cv.glmnet. Defaults to
10^seq(2, -3, length.out = cv_nlambda).
cv_second: flag to perform a second cross-validation search if the optimal $\lambda$ was found at the extremes of the first $\lambda$ sequence (indicating that the minimum may not be reliable). Defaults to TRUE.
cv_tol_second: tolerance for performing a second search if second = TRUE. If the minimum is found at the 100 * cv_tol_second lower/upper percentile of search interval, then the search interval is expanded for a second search. Defaults to 0.025.
cv_log10_exp: expansion of the $\lambda$ sequence if the minimum is found close to its upper extreme. If that is the case, the sequence for the is set as 10^(log10(lambda_min) + cv_log10_exp), where lambda_min is the minimum obtained in the first sequence. If the minimum is found close to the lower extreme of the sequence, then -rev(cv_log10_exp) is considered. Defaults to c(-0.5, 5).
cv_thresh: convergence threshold used during cross-validation in cv.glmnet. Defaults to 1e-5.
cv_parallel: passed to the parallel argument in cv.glmnet. Defaults to FALSE.
cv_verbose: flag to display information about the cross-validation search with plots and messages. More useful if second = TRUE. Defaults to FALSE.
...: further parameters to be passed to glmnet to perform the final model fit.

Author

Eduardo García-Portugués. Initial contributions by Gonzalo Álvarez-Pérez.

Details

If $\alpha = 1$, then the lasso-type fit shrinks to zero, simultaneously, all the elements of certain rows of $\mathbf{B}$, thus helping the selection of the $p$ most influential variables in $\mathbf{X}$ for explaining/predicting $\mathbf{Y}$.

The function first performs a cross-validation search for the optimal $\lambda$ if lambda = NULL (using cv_thresh to control the convergence threshold). After the optimal penalization parameter is determined, a second fit (now with convergence threshold thresh) using the default $\lambda$ sequence in glmnet is performed. The final estimate is obtained via predict.glmnet from the optimal $\lambda$ determined in the first step.

Due to its cross-validatory nature, cv_glmnet can be computationally demanding. Approaches for reducing the computation time include: considering a smaller number of folds than n, such as cv_folds = 10 (but will lead to random partitions of the data); decrease the tolerance of the coordinate descending algorithm by increasing cv_thresh; reducing the number of candidate $\lambda$ values with nlambda; setting second = FALSE to avoid a second cross-validation; or considering cv_parallel = TRUE to use a parallel backend (must be registered before hand; see examples).

By default, the $\lambda$ sequence is used with standardized $\mathbf{X}$ and $\mathbf{Y}$ (both divided by their columnwise variances); see glmnet and the standardized argument. Therefore, the optimal selected $\lambda$ value assumes standardization and must be used with care if the variables are not standardized. For example, when using the ridge analytical solution, a prior change of scale that depends on $q$ needs to be done. See the examples for the details.

References

Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1--22. tools:::Rd_expr_doi("10.18637/jss.v033.i01")

Examples

Run this code

## Quick example for multivariate linear model with multivariate response

# Simulate data
n <- 100
p <- 10; q <- 5
set.seed(123456)
x <- matrix(rnorm(n * p, sd = rep(1:p, each = n)), nrow = n, ncol = p)
e <- matrix(rnorm(n * q, sd = rep(q:1, each = n)), nrow = n, ncol = q)
beta <- matrix(((1:p - 1) / p)^2, nrow = p, ncol = q)
y <- x %*% beta + e

# Fit lasso (model with intercept, the default)
cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE)$beta_hat
# \donttest{
## Multivariate linear model with multivariate response

# Simulate data
n <- 100
p <- 10; q <- 5
set.seed(123456)
x <- matrix(rnorm(n * p, sd = rep(1:p, each = n)), nrow = n, ncol = p)
e <- matrix(rnorm(n * q, sd = rep(q:1, each = n)), nrow = n, ncol = q)
beta <- matrix(((1:p - 1) / p)^2, nrow = p, ncol = q)
y <- x %*% beta + e

# Fit ridge
cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", intercept = FALSE,
                 cv_verbose = TRUE)
cv0$beta_hat

# Same fit for the chosen lambda
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
          intercept = FALSE)$beta_hat

# Fit lasso (model with intercept, the default)
cv1 <- cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE)
cv1$beta_hat

# Use cv_1se = FALSE
cv1_min <- cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE,
                     cv_1se = FALSE)

# Compare it with ridge analytical solution. Observe the factor in lambda,
# necessary since lambda is searched for the standardized data. Note also
# that, differently to the case q = 1, no standardarization with respect to
# y happens
sd_x <- apply(x, 2, function(x) sd(x)) * sqrt((n - 1) / n)
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
          thresh = 1e-20, intercept = FALSE)$beta_hat
solve(crossprod(x) + diag(cv0$lambda * sd_x^2 * n)) %*% t(x) %*% y

# If x is standardized, the match between glmnet and usual ridge
# analytical expression does not require scaling of lambda
x_std <- scale(x, scale = sd_x, center = TRUE)
cv_glmnet(x = x_std, y = y, alpha = "ridge", lambda = cv0$lambda,
          intercept = FALSE, thresh = 1e-20)$beta_hat
solve(crossprod(x_std) + diag(rep(cv0$lambda * n, p))) %*% t(x_std) %*% y

## Simple linear model

# Simulate data
n <- 100
p <- 1; q <- 1
set.seed(123456)
x <- matrix(rnorm(n * p), nrow = n, ncol = p)
e <- matrix(rnorm(n * q), nrow = n, ncol = q)
beta <- 2
y <- x * beta + e

# Fit by ridge (model with intercept, the default)
cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", cv_verbose = TRUE)
cv0$beta_hat
cv0$intercept

# Comparison with linear model with intercept
lm(y ~ 1 + x)$coefficients

# Fit by ridge (model without intercept)
cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", cv_verbose = TRUE,
                 intercept = FALSE)
cv0$beta_hat

# Comparison with linear model without intercept
lm(y ~ 0 + x)$coefficients

# Same fit for the chosen lambda (and without intercept)
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
          intercept = FALSE)$beta_hat

# Same for lasso (model with intercept, the default)
cv1 <- cv_glmnet(x = x, y = y, alpha = "lasso")
cv1$beta_hat

## Multivariate linear model (p = 3, q = 1)

# Simulate data
n <- 50
p <- 10; q <- 1
set.seed(123456)
x <- matrix(rnorm(n * p, mean = 1, sd = rep(1:p, each = n)),
            nrow = n, ncol = p)
e <- matrix(rnorm(n * q), nrow = n, ncol = q)
beta <- ((1:p - 1) / p)^2
y <- x %*% beta + e

# Fit ridge (model without intercept)
cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", intercept = FALSE,
                 cv_verbose = TRUE)
cv0$beta_hat

# Same fit for the chosen lambda
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
          intercept = FALSE)$beta_hat

# Compare it with ridge analytical solution. Observe the factor in lambda,
# necessary since lambda is searched for the standardized data
sd_x <- apply(x, 2, function(x) sd(x)) * sqrt((n - 1) / n)
sd_y <- sd(y) * sqrt((n - 1) / n)
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
          intercept = FALSE, thresh = 1e-20)$beta_hat
solve(crossprod(x) + diag(cv0$lambda * sd_x^2 / sd_y * n)) %*% t(x) %*% y

# If x and y are standardized, the match between glmnet and usual ridge
# analytical expression does not require scaling of lambda
x_std <- scale(x, scale = sd_x, center = TRUE)
y_std <- scale(y, scale = sd_y, center = TRUE)
cv_glmnet(x = x_std, y = y_std, alpha = "ridge", lambda = cv0$lambda,
          intercept = FALSE, thresh = 1e-20)$beta_hat
solve(crossprod(x_std) + diag(rep(cv0$lambda * n, p))) %*% t(x_std) %*% y_std

# Fit lasso (model with intercept, the default)
cv1 <- cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE)
cv1$beta_hat

# ## Parallelization
#
# # Parallel
# doMC::registerDoMC(cores = 2)
# microbenchmark::microbenchmark(
# cv_glmnet(x = x, y = y, nlambda = 100, cv_parallel = TRUE),
# cv_glmnet(x = x, y = y, nlambda = 100, cv_parallel = FALSE),
# times = 10)
# }

Run the code above in your browser using DataLab