slim: Sparse Linear Regression using Nonsmooth Loss Functions and L1 Regularization

Description

The function "slim" implements a family of Lasso variants for estimating high-dimensional sparse linear models, including Dantzig Selector, LAD Lasso, SQRT Lasso, and Lq Lasso. We adopt the alternating direction method of multipliers (ADMM) and convert the original optimization problem into a sequence of L1-penalized least squares minimization problems, which can be efficiently solved by combining linearization and multi-stage screening of variables. Missing values can be tolerated for Dantzig selector in the design matrix and response vector.

Usage

slim(X, Y, lambda = NULL, nlambda = NULL, 
     lambda.min.value = NULL, lambda.min.ratio = NULL, 
     rho = 1, method="lq", q = 2, res.sd = FALSE, 
     prec = 1e-5, max.ite = 1e5, verbose = TRUE)

Value

An object with S3 class "slim" is returned:

beta: A matrix of regression estimates whose columns correspond to regularization parameters.
intercept: The value of intercepts corresponding to regularization parameters.
Y: The value of Y used in the program.
X: The value of X used in the program.
lambda: The sequence of regularization parameters lambda used in the program.
nlambda: The number of values used in lambda.
method: The method from the input.
df: The number of nonzero coefficients at each value of lambda.
ite: Iteration counts returned by the underlying optimization solver.
verbose: The verbose from the input.

Arguments

Y: The $n$-dimensional response vector.
X: The $n$ by $d$ design matrix.
lambda: A sequence of decreasing, positive, finite numbers controlling regularization. Typical usage is to leave lambda = NULL and let the program compute the sequence based on nlambda, lambda.min.value, and lambda.min.ratio.
nlambda: The number of values used in lambda. Default value is 5.
lambda.min.value: The minimum value in the generated lambda sequence when lambda is not supplied. The default is $\sqrt{\log(d)/n}$ for non-Dantzig methods.
lambda.min.ratio: A multiplier for lambda.max used to generate lambda.min.value when method = "dantzig" and lambda.min.value is not provided. The default is 0.5 for Dantzig selector.
rho: The penalty parameter used in ADMM. The default value is 1.
method: Dantzig selector is applied if method = "dantzig" and $L_q$ Lasso is applied if method = "lq". Standard Lasso is provided if method = "lasso". The default value is "lq".
q: The loss function used in Lq Lasso. It is only applicable when method = "lq" and must be in [1,2]. The default value is 2.
res.sd: Whether the response variable is standardized. The default value is FALSE.
prec: Stopping criterion. The default value is 1e-5.
max.ite: The iteration limit. The default value is 1e5.
verbose: Tracing information printing is disabled if verbose = FALSE. The default value is TRUE.

Author

Xingguo Li, Tuo Zhao, Lie Wang, Xiaoming Yuan and Han Liu
Maintainer: Tuo Zhao <tourzhao@gatech.edu>

Details

Standard Lasso
$$ \min {\frac{1}{2n}}|| Y - X \beta ||_2^2 + \lambda || \beta ||_1 $$
Dantzig selector solves the following optimization problem
$$ \min || \beta ||_1, \quad \textrm{s.t. } || X'(Y - X \beta) ||_{\infty} < \lambda $$
$L_q$ loss Lasso solves the following optimization problem
$$ \min n^{-\frac{1}{q}}|| Y - X \beta ||_q + \lambda || \beta ||_1 $$
where $1<= q <=2$. Lq Lasso is equivalent to LAD Lasso and SQRT Lasso when $q=1$ and $q=2$, respectively.

References

1. E. Candes and T. Tao. The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics, 2007.
2. A. Belloni, V. Chernozhukov and L. Wang. Pivotal recovery of sparse signals via conic programming. Biometrika, 2012.
3. L. Wang. L1 penalized LAD estimator for high dimensional linear regression. Journal of Multivariate Analysis, 2012.
4. J. Liu and J. Ye. Efficient L1/Lq Norm Regularization. Technical Report, 2010.
5. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning, 2011.
6. B. He and X. Yuan. On non-ergodic convergence rate of Douglas-Rachford alternating direction method of multipliers. Technical Report, 2012.

Examples

Run this code

## load library
library(flare)
## generate data
n = 50
d = 100
X = matrix(rnorm(n*d), n, d)
beta = c(3,2,0,1.5,rep(0,d-4))
eps = rnorm(n)
Y = X%*%beta + eps
nlamb = 5
ratio = 0.3

## Regression with "dantzig", general "lq" and "lasso" respectively
out1 = slim(X=X,Y=Y,nlambda=nlamb,lambda.min.ratio=ratio,method="dantzig")
out2 = slim(X=X,Y=Y,nlambda=nlamb,lambda.min.ratio=ratio,method="lq",q=1)
out3 = slim(X=X,Y=Y,nlambda=nlamb,lambda.min.ratio=ratio,method="lq",q=1.5)
out4 = slim(X=X,Y=Y,nlambda=nlamb,lambda.min.ratio=ratio,method="lq",q=2)
out5 = slim(X=X,Y=Y,nlambda=nlamb,lambda.min.ratio=ratio,method="lasso")

## Display results
print(out4)
plot(out4)
coef(out4)

Run the code above in your browser using DataLab