Learn R Programming

plus (version 1.0)

plus: Fits linear regression with a quadratic spline penalty, including the Lasso, MC+ and SCAD.

Description

The algorithm generates a piecewise linear path of coefficients and penalty levels as critical points of a penalized loss in linear regression, starting with zero coefficients for infinity penalty and ending with a least squares fit for zero penalty. It is an extension of the LARS algorithm from the absolute value penalty to quadratic spline penalties.

Usage

plus(x,y, method = c("lasso", "mc+", "scad", "general"), m=2, gamma,v,t, monitor=FALSE, normalize = TRUE, intercept = TRUE, Gram, use.Gram = FALSE, eps=1e-15, max.steps=500, lam)

Arguments

x
predictors, an n by p matrix with n > 1 and p > 1.
y
response, an n-vector with n > 1.
method
c("lasso", "mc+", "scad", "general"); the LASSO penalty is specified by m = 1, MC+ is specified by m = 2 and gamma > 0, SCAD by m = 3 and gamma > 1. A general quadratic penalty is specified by m-vectors v and t.
m
number of knots with a quadratic spline penalty: m = 1 for Lasso, m = 2 for MC+, m = 3 for SCAD. Default is m = 2.
gamma
the largest knot of a quadratic spline penalty, say rho(.); gamma = 0 for lasso.
v
m-vector giving the negative second derivative rho(.) of the penalty between two knots or beyond gamma.
t
m-vector giving the discontinuities of the derivatives of the penalty function rho(.) as knots, including 0 as a knot.
monitor
If TRUE, plus prints out its progress when variables move in and out of the active set. Default is FALSE.
normalize
If TRUE, each variable is standardized to have unit mean squares, otherwise it is left alone. Default is TRUE.
intercept
If TRUE, an intercept is included in the model (and not penalized), otherwise no intercept is included. Default is TRUE.
Gram
The X'X matrix; useful for repeated runs (e.g. bootstrap) where a large X'X stays the same.
use.Gram
When p is very large, you may not want PLUS to precompute the entire Gram matrix. Default is FALSE.
eps
An effective zero.
max.steps
Limit the number of steps taken. Default is 500. There can be many more steps than n or p since variables can be removed and added as the algorithm proceeds. Users should check if the desired penalty level is reached if PLUS ends in the maximum step.
lam
A decreasing sequence of nonnegative numbers as penalty levels for which penalized estimates of coefficients are generated. Default is the vector of ordered penalty levels at the turning points of the computed path. If lam is set, the computation stops when the path first hits the minimum of lam. The scale of lam is determined by the penalized loss sum((y - x

Value

A "plus" object is returned, for which print, predict, coef and plot methods exist. In addition to arguments x, y, max.steps, and the used values of method, gamma and lam, the object contains the following items:Some significant components of the object are:
v
matrix with rows as p-vectors indicating the parallelepipeds in which the computed path lives
beta.path
Tmatrix with rows as p-vectors of regression coefficients at the turning points of the solution path
lam.path
penalty levels at the turning points of the computed path. When the penalty function is concave, lam.path may not be a decreasing sequence but always takes nonnegative values.
beta
matrix with rows as p-vector of coefficients when the solution path first hits lam
lam
the specified penalty levels hit by lam.path. This may not be the same as argument lam if the minimum of the argument is not reached by the computed solution path.
dim
the number of nonzero beta
r.square
R-square values for beta
total.hits
length of output lam
total.steps
total number of steps executed, the same as the total number of segments in the computed solution path. With zero as the first coefficient vector, beta.path contains one more vector than total.steps.
full.path
TRUE if zero penalty is reached.
forced.stop
TRUE if PLUS is forced to stop due to reasons other than reaching max.steps or the minimum of argument lam.
singular.Q
TRUE if PLUS is forced to stop when a matrix is not invertible.

Details

PLUS is described in detail in Zhang (2007). It computes a complete path of crititcal points of a penalised squared loss emcompassing from zero for infinite penalty to a lease squares fit for zero penalty, including possible multiple local minima for each penalty level.

References

Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics 38, 894-942.

See Also

print, plot, and predict methods

Examples

Run this code
data(sp500)
attach(sp500)
x <- sp500.percent[,3: (dim(sp500.percent)[2])] 
y <- sp500.percent[,1]

par(mfrow=c(2,3))
object <- plus(x,y,method="lasso")
plot(object)
plot(object, yvar="dim")
plot(object, yvar="R-sq")
object <- plus(x,y,method="mc+")
plot(object)
plot(object, yvar="dim")
plot(object, yvar="R-sq")
detach(sp500)

Run the code above in your browser using DataLab