bounded.reg: Fit a linear model with infinity-norm plus ridge-like regularization

Description

Adjust a linear model penalized by a mixture of a (possibly weighted) \(\ell_\infty\)-norm (bounding the magnitude of the parameters) and a (possibly structured) \(\ell_2\)-norm (ridge-like). The solution path is computed at a grid of values for the infinity-penalty, fixing the amount of \(\ell_2\) regularization. See details for the criterion optimized.

Usage

bounded.reg(
  x,
  y,
  lambda1 = NULL,
  lambda2 = 0.01,
  penscale = rep(1, p),
  struct = NULL,
  intercept = TRUE,
  normalize = TRUE,
  naive = FALSE,
  nlambda1 = ifelse(is.null(lambda1), 100, length(lambda1)),
  min.ratio = ifelse(n

Value

an object with class quadrupen, see the documentation page quadrupen for details.

Arguments

x

matrix of features, possibly sparsely encoded (experimental). Do NOT include intercept. When normalized os TRUE, coefficients will then be rescaled to the original scale.

y

response vector.

lambda1

sequence of decreasing \(\ell_\infty\) penalty levels. If NULL (the default), a vector is generated with nlambda1 entries, starting from a guessed level lambda1.max where only the intercept is included, then shrunken to min.ratio*lambda1.max.

lambda2

real scalar; tunes the \(\ell_2\)-penalty in the bounded regression. Default is 0.01. Set to 0 to regularize only by the infinity norm (be careful regarding numerical stability in that case, particularly in the high dimensional setting).

penscale

vector with real positive values that weight the infinity norm of each feature. Default set all weights to 1. See details below.

struct

matrix structuring the coefficients. Must be at least positive semidefinite (this is checked internally if the checkarg argument is TRUE). The default uses the identity matrix. See details below.

intercept

logical; indicates if an intercept should be included in the model. Default is TRUE.

normalize

logical; indicates if variables should be normalized to have unit L2 norm before fitting. Default is TRUE.

naive

logical; Compute either 'naive' of 'classic' bounded regression: mimicking the Elastic-net, the vector of parameters is rescaled by a coefficient (1+lambda2) when naive equals FALSE. No rescaling otherwise. Default is FALSE.

nlambda1

integer that indicates the number of values to put in the lambda1 vector. Ignored if lambda1 is provided.

min.ratio

minimal value of infinity-part of the penalty that will be tried, as a fraction of the maximal lambda1 value. A too small value might lead to unstability at the end of the solution path corresponding to small lambda1. The default value tries to avoid this, adapting to the '\(n<p\)' context. Ignored if lambda1 is provided.

max.feat

integer; limits the number of features ever to enter the model: in our implementation of the bounded regression, it corresponds to the variables which have left the boundary along the path. The algorithm stops if this number is exceeded and lambda1 is cut at the corresponding level. Default is min(nrow(x),ncol(x)) for small lambda2 (<0.01) and min(4*nrow(x),ncol(x)) otherwise. Use with care, as it considerably changes the computation time.

control

list of argument controlling low level options of the algorithm --use with care and at your own risk-- :

verbose:

integer; activate verbose mode --this one is not too much risky!-- set to 0 for no output; 1 for warnings only, and 2 for tracing the whole progression. Default is 1. Automatically set to 0 when the method is embedded within cross-validation or stability selection.

timer:

logical; use to record the timing of the algorithm. Default is FALSE.

max.iter:

the maximal number of iteration used to solve the problem for a given value of lambda1. Default is 500.

method:

a string for the underlying solver used. Either "quadra" or "fista" are available for bounded regression. Default is "quadra".

threshold:

a threshold for convergence. The algorithm stops when the optimality conditions are fulfill up to this threshold. Default is 1e-7 for "quadra" and 1e-2 for "fista".

bulletproof:

logical; indicates if the bulletproof mode should be used while running the "quadra" method. Default is TRUE. See details below.

checkargs

logical; should arguments be checked to (hopefully) avoid internal crashes? Default is TRUE. Automatically set to FALSE when calls are made from cross-validation or stability selection procedures.

Details

The optimized criterion is β^hat_λ₁,λ₂ = argmin_β 1/2 RSS(β) + λ₁ | D β |_∞ + λ/2 ₂ β^T S β,

where \(D\) is a diagonal matrix, whose diagonal terms are provided as a vector by the penscale argument. The \(\ell_2\) structuring matrix \(S\) is provided via the struct argument, a positive semidefinite matrix (possibly of class Matrix). Note that the quadratic algorithm for the bounded regression may become unstable along the path because of singularity of the underlying problem, e.g. when there are too much correlation or when the size of the problem is close to or smaller than the sample size. In such cases, it might be a good idea to switch to the proximal solver, slower yet more robust. This is the strategy adopted by the 'bulletproof' mode, that will send a warning while switching the method to 'fista' and keep on optimizing on the remainder of the path. When bulletproof is set to FALSE, the algorithm stops at an early stage of the path of solutions. Hence, users should be careful when manipulating the resulting 'quadrupen' object, as it will not have the size expected regarding the dimension of the lambda1 argument.

Singularity of the system can also be avoided with a larger \(\ell_2\)-regularization, via lambda2, or a "not-too-small" \(\ell_\infty\) regularization, via a larger 'min.ratio' argument.

Examples

Run this code

## Simulating multivariate Gaussian with blockwise correlation
## and piecewise constant vector of parameters
beta <- rep(c(0,1,0,-1,0), c(25,10,25,10,25))
cor <- 0.75
Soo <- toeplitz(cor^(0:(25-1))) ## Toeplitz correlation for irrelevant variables
Sww  <- matrix(cor,10,10) ## bloc correlation between active variables
Sigma <- bdiag(Soo,Sww,Soo,Sww,Soo)
diag(Sigma) <- 1
n <- 50
x <- as.matrix(matrix(rnorm(95*n),n,95) %*% chol(Sigma))
y <- 10 + x %*% beta + rnorm(n,0,10)

## Infinity norm without/with an additional l2 regularization term
## and with structuring prior
labels <- rep("irrelevant", length(beta))
labels[beta != 0] <- "relevant"
plot(bounded.reg(x,y,lambda2=0) , label=labels) ## a mess
plot(bounded.reg(x,y,lambda2=10), label=labels) ## good guys are at the boundaries