plus: Fits linear regression with a quadratic spline penalty, including the Lasso, MC+ and SCAD.

Description

The algorithm generates a piecewise linear path of coefficients and penalty levels as critical points of a penalized loss in linear regression, starting with zero coefficients for infinity penalty and ending with a least squares fit for zero penalty. It is an extension of the LARS algorithm from the absolute value penalty to quadratic spline penalties.

Usage

plus(x,y, method = c("lasso", "mc+", "scad", "general"), m=2, gamma,v,t, monitor=FALSE, normalize = TRUE, intercept = TRUE, Gram, use.Gram = FALSE, eps=1e-15, max.steps=500, lam)

Arguments

predictors, an n by p matrix with n > 1 and p > 1.

response, an n-vector with n > 1.

method

c("lasso", "mc+", "scad", "general"); the LASSO penalty is specified by m = 1, MC+ is specified by m = 2 and gamma > 0, SCAD by m = 3 and gamma > 1. A general quadratic penalty is specified by m-vectors v and t.

number of knots with a quadratic spline penalty: m = 1 for Lasso, m = 2 for MC+, m = 3 for SCAD. Default is m = 2.

gamma

the largest knot of a quadratic spline penalty, say rho(.); gamma = 0 for lasso.

m-vector giving the negative second derivative rho(.) of the penalty between two knots or beyond gamma.

m-vector giving the discontinuities of the derivatives of the penalty function rho(.) as knots, including 0 as a knot.

monitor

If TRUE, plus prints out its progress when variables move in and out of the active set. Default is FALSE.

normalize

If TRUE, each variable is standardized to have unit mean squares, otherwise it is left alone. Default is TRUE.

intercept

If TRUE, an intercept is included in the model (and not penalized), otherwise no intercept is included. Default is TRUE.

Gram

The X'X matrix; useful for repeated runs (e.g. bootstrap) where a large X'X stays the same.

use.Gram

When p is very large, you may not want PLUS to precompute the entire Gram matrix. Default is FALSE.

eps

An effective zero.

max.steps

Limit the number of steps taken. Default is 500. There can be many more steps than n or p since variables can be removed and added as the algorithm proceeds. Users should check if the desired penalty level is reached if PLUS ends in the maximum step.

lam

A decreasing sequence of nonnegative numbers as penalty levels for which penalized estimates of coefficients are generated. Default is the vector of ordered penalty levels at the turning points of the computed path. If lam is set, the computation stops when the path first hits the minimum of lam. The scale of lam is determined by the penalized loss sum((y - x

Value

v: matrix with rows as p-vectors indicating the parallelepipeds in which the computed path lives
beta.path: Tmatrix with rows as p-vectors of regression coefficients at the turning points of the solution path
lam.path: penalty levels at the turning points of the computed path. When the penalty function is concave, lam.path may not be a decreasing sequence but always takes nonnegative values.
beta: matrix with rows as p-vector of coefficients when the solution path first hits lam
lam: the specified penalty levels hit by lam.path. This may not be the same as argument lam if the minimum of the argument is not reached by the computed solution path.
dim: the number of nonzero beta
r.square: R-square values for beta
total.hits: length of output lam
total.steps: total number of steps executed, the same as the total number of segments in the computed solution path. With zero as the first coefficient vector, beta.path contains one more vector than total.steps.
full.path: TRUE if zero penalty is reached.
forced.stop: TRUE if PLUS is forced to stop due to reasons other than reaching max.steps or the minimum of argument lam.
singular.Q: TRUE if PLUS is forced to stop when a matrix is not invertible.

Details

PLUS is described in detail in Zhang (2007). It computes a complete path of crititcal points of a penalised squared loss emcompassing from zero for infinite penalty to a lease squares fit for zero penalty, including possible multiple local minima for each penalty level.

References

Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics 38, 894-942.

Examples

Run this code

data(sp500)
attach(sp500)
x <- sp500.percent[,3: (dim(sp500.percent)[2])] 
y <- sp500.percent[,1]

par(mfrow=c(2,3))
object <- plus(x,y,method="lasso")
plot(object)
plot(object, yvar="dim")
plot(object, yvar="R-sq")
object <- plus(x,y,method="mc+")
plot(object)
plot(object, yvar="dim")
plot(object, yvar="R-sq")
detach(sp500)

Run the code above in your browser using DataLab