The basic idea of the AO penalty is
to use a linear combination of $L_1$-norm and the bridge penalty with $\gamma > 1$ where the amount of
the bridge penalty part is driven by empirical
correlation. So, consider the penalty
$$P_{\tilde{\lambda}}^{ao}(\boldsymbol{\beta}) = \sum_{i = 2}^p \sum_{j< i} p_{\tilde{\lambda},ij}
(\boldsymbol{\beta}), \quad \tilde{\lambda} = (\lambda, \gamma)$$
where
$$p_{\tilde{\lambda},ij} = \lambda[(1 - |\varrho_{ij}|) (|\beta_i| + |\beta_j|) + |\varrho_{ij}|(|\beta_i|^\gamma + |\beta_j|^\gamma)],$$
and $\varrho_{ij}$ denotes the value of the (empirical) correlation of the i-th and j-th regressor. Since we are going to
approximate an octagonal polytope in two dimensions, we will refer to this penalty as approximated octagon
(AO) penalty. Note that $P_{\tilde{\lambda}}^{ao}(\boldsymbol{\beta})$ leads to a dominating lasso term if the regressors are uncorrelated and to a
dominating bridge term if they are nearly perfectly correlated.The penalty can be rearranged as
$$P_{\tilde{\lambda}}^{ao}(\boldsymbol{\beta}) = \sum_{i=1}^p p_{\tilde{\lambda},i}^{ao}(\beta_i),$$
where
$$p_{\tilde{\lambda},i}^{ao}(\beta_i) = \lambda \left{|\beta_i|\sum_{j \neq i} (1 - |\varrho_{ij}|) + |\beta_i|^\gamma \sum_{j \neq i} |\varrho_{ij}|\right}.$$
It uses two tuning parameters $\tilde{\lambda} = (\lambda, \gamma)$, where $\lambda$ controls the penalty amount and $\gamma$
manages the approximation of the pairwise $L_\infty$-norm.