scad: The SCAD Penalty

Description

Object of the penalty class to handle the SCAD penalty (Fan & Li, 2001)

Usage

scad(lambda = NULL, ...)

Arguments

lambda

two-dimensional tuning parameter. The first component corresponds to the regularization parameter $\lambda$ that drives the relevance of the SCAD penalty for likelihood inference. It must be nonnegative. The second component corresponds to $a$ (see detail

...

further arguments.

Value

An object of the class penalty. This is a list with elements
penaltycharacter: the penalty name.
lambdadouble: the (nonnegative) tuning parameter.
getpenmatfunction: computes the diagonal penalty matrix.

Details

The SCAD penalty is formally defined as $$P_{\tilde{\lambda}}^{sc} (\boldsymbol{\beta}) = \sum_{j=1}^p p_{\tilde{\lambda},j}^{sc} (|\beta_j|), \quad \tilde{\lambda} = (\lambda, s),$$ where $p_{\tilde{\lambda},j}^{sc} (|\beta_j|)$ is complicated to be specified directly. Fan & Li (2001) just give the penalty by the first derivatives of its components as $$\frac{d p_{\tilde{\lambda},j}^{sc} (|\beta_j|)}{d |\beta_j|} = \lambda\left{1_{|\beta_j| \leq \lambda}(|\beta_j|) + \frac{(a\lambda - |\beta_j|)_+}{(a-1)\lambda}1_{|\beta_j| > \lambda} (|\beta_j|) \right},$$ where we use the notation $b_+ := \max {0, b}$ and $1_A(x)$ denotes the indicator function. The penalty depends on two tuning parameters, $\lambda>0$ and $a>2$. It is continuously differentiable in $\beta_j$, but not in their tuning parameters. If $|\beta_j| \leq \lambda$ then the lasso penalty is applied to $\beta_j$. Afterwards this penalization smoothly clipped apart until the threshold a is reached. For $|\beta_j| > a$ there is no penalization at all at this coefficient. Fan & Li (2001) suggest to use $a = 3.7$. The SCAD penalty leaves large values of $\beta_j$ not excessively penalized and makes the solution continuous.

References

Fan, J. & R. Li (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96, 1348--1360.