penalreg: Correlation-based Penalty

Description

Object of the penalty to handle the correlation-based penalty (Tutz & Ulbricht, 2009).

Usage

penalreg(lambda = NULL, ...)

Arguments

lambda

regularization parameter. This must be a nonnegative real number.

...

further arguments

Value

An object of the class penalty. This is a list with elements
penaltycharacter: the penalty name.
lambdadouble: the (nonnegative) regularization parameter.
getpenmatfunction: computes the diagonal penalty matrix.

Details

The method proposed in Tutz & Ulbricht (2009) and Ulbricht & Tutz (2008) utilizes the correlation between regressors explicitly in the penalty term. Coefficients which correspond to pairs of covariates are weighted according to their marginal correlation. The correlation-based penalty is given by $$P_{\lambda}^{cb}(\boldsymbol{\beta}) = \frac{\lambda}{2} \sum_{i=1}^{p-1}\sum_{j > i}\left{ \frac{(\beta_{i}-\beta_{j})^{2}}{1-\varrho_{ij}} + \frac{(\beta_{i}+\beta_{j})^{2}}{1+\varrho_{ij}}\right}$$ where $\varrho_{ij}$ denotes the (empirical) correlation between the i-th and the j-th regressor. It is designed in a way so that for strong positive correlation $(\varrho_{ij}\uparrow 1)$ the first term becomes dominant having the effect that estimates for $\beta_i$ and $\beta_j$ are similar $(\hat\beta_i\approx\hat\beta_j)$. For strong negative correlation $(\varrho_{ij}\downarrow -1)$ the second term becomes dominant and $\hat\beta_i$ will be close to $-\hat\beta_j$. The effect is grouping, highly correlated effects show comparable values of estimates $(|\hat\beta_i|\approx|\hat\beta_j|)$ with the sign being determined by positive or negative correlation. If the regressors are uncorrelated $(\varrho_{ij}=0)$ one obtains (up to a constant) the ridge penalty, i.e. $P_\lambda^{cb}(\boldsymbol{\beta})\propto\lambda\sum_{i=1}^p\beta_i^2$. Consequently, for weakly correlated data the performance is quite close to the ridge estimator. Therefore, as for the elastic net ridge regression is a limiting case.

The correlation-based penalty is a quadratic penalty. Consequently, in general it will not be able to select variables. For this reason there have been introduced some advanced boosting techniques, such as GBlockBoost or ForwardBoost. See GBlockBoost and ForwardBoost for further details.

References

Tutz, G. & J. Ulbricht (2009) Penalized Regression with correlation based penalty. Statistics and Computing 19, 239--253.

Ulbricht, J. & G. Tutz (2008) Boosting correlation based penalization in generalized linear models. In Shalabh & C. Heumann (Eds.) Recent Advances in Linear Models and Related Areas. Heidelberg: Springer.