genlasso: Compute the generalized lasso solution path for arbitrary penalty matrix

Description

This function computes the solution path of the generalized lasso problem for an arbitrary penalty matrix. Speciality functions exist for the trend filtering and fused lasso problems; see trendfilter and fusedlasso.

Usage

genlasso(y, X, D, approx = FALSE, maxsteps = 2000, minlam = 0,
         tol = 1e-11, eps = 1e-8, verbose = FALSE)

Arguments

a numeric response vector.

an optional matrix of predictor variables, with observations along the rows, and variables along the columns. If missing, X is assumed to be the identity matrix. If the passed X does not have full column rank, then a

a penalty matrix. Its number of columns must be equal to the number of columns of X, or if no X is given, the length of y. This can be a sparse matrix from Matrix package, but this will be i

approx

a logical variable indicating if the approximate solution path should be used (with no dual coordinates leaving the boundary). Default is FALSE.

maxsteps

an integer specifying the maximum number of steps for the algorithm to take before termination. Default is 2000.

minlam

a numeric variable indicating the value of lambda at which the path should terminate. Default is 0.

tol

a numeric variable giving the tolerance used in the calculation of the hitting and leaving times. A larger value is more conservative, and may cause the algorithm to miss some hitting or leaving events (do not change unless you know what you'r

eps

a numeric variable indicating the multiplier for the ridge penalty, in the case that X is column rank deficient. Default is 1e-8.

verbose

a logical variable indicating if progress should be reported after each knot in the path.

Value

Returns an object of class "genlasso", a list with at least following components:
lambdavalues of lambda at which the solution path changes slope, i.e., kinks or knots.
betaa matrix of primal coefficients, each column corresponding to a knot in the solution path.
fita matrix of fitted values, each column corresponding to a knot in the solution path.
ua matrix of dual coefficients, each column corresponding to a knot in the solution path.
hita vector of logical values indicating if a new variable in the dual solution hit the box contraint boundary. A value of FALSE indicates a variable leaving the boundary.
dfa vector giving an unbiased estimate of the degrees of freedom of the fit at each knot in the solution path.
ythe observed response vector. Useful for plotting and other methods.
completepatha logical variable indicating whether the complete path was computed (terminating the path early with the maxsteps or minlam options results in a value of FALSE).
blsthe least squares solution, i.e., the solution at lambda = 0.
callthe matched call.

Details

The generalized lasso estimate minimizes the criterion $$1/2 \|y - X \beta\|_2^2 + \lambda \|D \beta\|_1.$$ The solution $\hat{\beta}$ is computed as a function of the regularization parameter $\lambda$. The advantage of the genlasso function lies in its flexibility, i.e., the user can specify any penalty matrix D of their choosing. However, for a trend filtering problem or a fused lasso problem, it is strongly recommended to use one of the speciality functions, trendfilter or fusedlasso. When compared to these functions, genlasso is not as numerically stable and much less efficient.

Note that, when D is passed as a sparse matrix, the linear systems that arise at each step of the path algorithm are solved separately via a sparse solver. The usual strategy (when D is simply a matrix) is to maintain a matrix factorization of D, and solve these systems by (or downdating) this factorization, as these linear systems are highly related. Therefore, when D is sufficiently sparse and structured, it can be advantageous to pass it as a sparse matrix; but if D is truly dense, passing it as a sparse matrix will be highly inefficient.

References

Tibshirani, R. J. and Taylor, J. (2011), "The solution path of the generalized lasso", Annals of Statistics 39 (3) 1335--1371.

Examples

Run this code

# Using the generalized lasso to run a standard lasso regression
# (not advisable---for example purposes only!)
set.seed(1)
n = 100
p = 10
X = matrix(rnorm(n*p),nrow=n)
y = 3*X[,1] + rnorm(n)
D = diag(1,p)
out = genlasso(y,X,D)
coef(out, lambda=sqrt(n*log(p)))

Run the code above in your browser using DataLab