Object of the penalty class to handle the SCAD penalty (Fan & Li, 2001)
Usage
scad(lambda = NULL, ...)
Arguments
lambda
two-dimensional tuning parameter. The first component corresponds to the regularization parameter $\lambda$ that drives the relevance of the
SCAD penalty for likelihood inference. It must be nonnegative. The second component corresponds to $a$ (see detail
...
further arguments.
Value
An object of the class penalty. This is a list with elements
penaltycharacter: the penalty name.
lambdadouble: the (nonnegative) tuning parameter.
getpenmatfunction: computes the diagonal penalty matrix.
Details
The SCAD penalty is formally defined as
$$P_{\tilde{\lambda}}^{sc} (\boldsymbol{\beta}) = \sum_{j=1}^p p_{\tilde{\lambda},j}^{sc} (|\beta_j|), \quad \tilde{\lambda} = (\lambda, s),$$
where $p_{\tilde{\lambda},j}^{sc} (|\beta_j|)$ is complicated to be specified directly.
Fan & Li (2001) just give the penalty by the first derivatives of its components as
$$\frac{d p_{\tilde{\lambda},j}^{sc} (|\beta_j|)}{d |\beta_j|} = \lambda\left{1_{|\beta_j| \leq \lambda}(|\beta_j|) +
\frac{(a\lambda - |\beta_j|)_+}{(a-1)\lambda}1_{|\beta_j| > \lambda} (|\beta_j|) \right},$$
where we use the notation $b_+ := \max {0, b}$ and $1_A(x)$ denotes the indicator function.
The penalty depends on two tuning parameters, $\lambda>0$ and $a>2$.
It is continuously differentiable in $\beta_j$, but not in their tuning
parameters. If $|\beta_j| \leq \lambda$ then the lasso penalty is applied to $\beta_j$.
Afterwards this penalization smoothly clipped apart until
the threshold a is reached. For $|\beta_j| > a$ there is no penalization at all at this coefficient.
Fan & Li (2001) suggest to use $a = 3.7$.
The SCAD penalty leaves large values of $\beta_j$ not excessively penalized and makes the solution continuous.
References
Fan, J. & R. Li (2001) Variable selection via nonconcave penalized likelihood and its oracle properties.
Journal of the American Statistical Association96, 1348--1360.