Fit a generalized linear model regularized with the sorted L1 norm, which applies a non-increasing regularization sequence to the coefficient vector (\(\beta\)) after having sorted it in decreasing order according to its absolute values.

```
SLOPE(
x,
y,
family = c("gaussian", "binomial", "multinomial", "poisson"),
intercept = TRUE,
center = !inherits(x, "sparseMatrix"),
scale = c("l2", "l1", "sd", "none"),
alpha = c("path", "estimate"),
lambda = c("bh", "gaussian", "oscar"),
alpha_min_ratio = if (NROW(x) < NCOL(x)) 0.01 else 1e-04,
path_length = if (alpha[1] == "estimate") 1 else 20,
q = 0.1 * min(1, NROW(x)/NCOL(x)),
screen = TRUE,
screen_alg = c("strong", "previous"),
tol_dev_change = 1e-05,
tol_dev_ratio = 0.995,
max_variables = NROW(x),
solver = c("fista", "admm"),
max_passes = 1e+06,
tol_abs = 1e-05,
tol_rel = 1e-04,
tol_rel_gap = 1e-05,
tol_infeas = 0.001,
tol_rel_coef_change = 0.001,
diagnostics = FALSE,
verbosity = 0,
sigma,
n_sigma,
lambda_min_ratio
)
```

x

the design matrix, which can be either a dense
matrix of the standard *matrix* class, or a sparse matrix
inheriting from Matrix::sparseMatrix. Data frames will
be converted to matrices internally.

y

the response, which for `family = "gaussian"`

must be numeric; for
`family = "binomial"`

or `family = "multinomial"`

, it can be a factor.

family

model family (objective); see **Families** for details.

intercept

whether to fit an intercept

center

whether to center predictors or not by their mean. Defaults
to `TRUE`

if `x`

is dense and `FALSE`

otherwise.

scale

type of scaling to apply to predictors.

`"l1"`

scales predictors to have L1 norms of one.`"l2"`

scales predictors to have L2 norms of one.#'`"sd"`

scales predictors to have a population standard deviation one.`"none"`

applies no scaling.

alpha

scale for regularization path: either a decreasing numeric vector (possibly of length 1) or a character vector; in the latter case, the choices are:

`"path"`

, which computes a regularization sequence where the first value corresponds to the intercept-only (null) model and the last to the almost-saturated model, and`"estimate"`

, which estimates a*single*`alpha`

using Algorithm 5 in Bogdan et al. (2015).

When a value is manually entered for `alpha`

, it will be scaled based
on the type of standardization that is applied to `x`

. For `scale = "l2"`

,
`alpha`

will be scaled by \(\sqrt n\). For `scale = "sd"`

or `"none"`

,
alpha will be scaled by \(n\), and for `scale = "l1"`

no scaling is
applied. Note, however, that the `alpha`

that is returned in the
resulting value is the **unstandardized** alpha.

lambda

either a character vector indicating the method used
to construct the lambda path or a numeric non-decreasing
vector with length equal to the number
of coefficients in the model; see section **Regularization sequences**
for details.

alpha_min_ratio

smallest value for `lambda`

as a fraction of
`lambda_max`

; used in the selection of `alpha`

when `alpha = "path"`

.

path_length

length of regularization path; note that the path
returned may still be shorter due to the early termination criteria
given by `tol_dev_change`

, `tol_dev_ratio`

, and `max_variables`

.

q

parameter controlling the shape of the lambda sequence, with
usage varying depending on the type of path used and has no effect
is a custom `lambda`

sequence is used.

screen

whether to use predictor screening rules (rules that allow some predictors to be discarded prior to fitting), which improve speed greatly when the number of predictors is larger than the number of observations.

screen_alg

what type of screening algorithm to use.

`"strong"`

uses the set from the strong screening rule and check against the full set`"previous"`

first fits with the previous active set, then checks against the strong set, and finally against the full set if there are no violations in the strong set

tol_dev_change

the regularization path is stopped if the fractional change in deviance falls below this value; note that this is automatically set to 0 if a alpha is manually entered

tol_dev_ratio

the regularization path is stopped if the deviance ratio \(1 - \mathrm{deviance}/\mathrm{(null-deviance)}\) is above this threshold

max_variables

criterion for stopping the path in terms of the maximum number of unique, nonzero coefficients in absolute value in model. For the multinomial family, this value will be multiplied internally with the number of levels of the response minus one.

solver

type of solver use, either `"fista"`

or `"admm"`

;
all families currently support FISTA but only `family = "gaussian"`

supports ADMM.

max_passes

maximum number of passes (outer iterations) for solver

tol_abs

absolute tolerance criterion for ADMM solver

tol_rel

relative tolerance criterion for ADMM solver

tol_rel_gap

stopping criterion for the duality gap; used only with FISTA solver.

tol_infeas

stopping criterion for the level of infeasibility; used with FISTA solver and KKT checks in screening algorithm.

tol_rel_coef_change

relative tolerance criterion for change in coefficients between iterations, which is reached when the maximum absolute change in any coefficient divided by the maximum absolute coefficient size is less than this value.

diagnostics

whether to save diagnostics from the solver (timings and other values depending on type of solver)

verbosity

level of verbosity for displaying output from the program. Setting this to 1 displays basic information on the path level, 2 a little bit more information on the path level, and 3 displays information from the solver.

sigma

deprecated; please use `alpha`

instead

n_sigma

deprecated; please use `path_length`

instead

lambda_min_ratio

deprecated; please use `alpha_min_ratio`

instead

An object of class `"SLOPE"`

with the following slots:

a three-dimensional array of the coefficients from the model fit, including the intercept if it was fit. There is one row for each coefficient, one column for each target (dependent variable), and one slice for each penalty.

a three-dimensional logical array indicating whether a coefficient was zero or not

the lambda vector that when multiplied by a value in `alpha`

gives the penalty vector at that point along the regularization
path

vector giving the (unstandardized) scaling of the lambda sequence

a character vector giving the names of the classes for binomial and multinomial families

the number of passes the solver took at each step on the path

the number of violations of the screening rule at each step on the path;
only available if `diagnostics = TRUE`

in the call to `SLOPE()`

.

a list where each element indicates the indices of the coefficients that were active at that point in the regularization path

the number of unique predictors (in absolute value)

the deviance ratio (as a fraction of 1)

the deviance of the null (intercept-only) model

the name of the family used in the model fit

a `data.frame`

of objective values for the primal and dual problems, as
well as a measure of the infeasibility, time, and iteration; only
available if `diagnostics = TRUE`

in the call to `SLOPE()`

.

the call used for fitting the model

**Gaussian**

The Gaussian model (Ordinary Least Squares) minimizes the following objective: $$ \frac{1}{2} \Vert y - X\beta\Vert_2^2 $$

**Binomial**

The binomial model (logistic regression) has the following objective: $$ \sum_{i=1}^n \log\left(1+ \exp\left(- y_i \left(x_i^T\beta + \beta_0 \right) \right) \right) $$ with \(y \in \{-1, 1\}\).

**Poisson**

In poisson regression, we use the following objective:

$$ -\sum_{i=1}^n \left(y_i\left(x_i^T\beta + \beta_0\right) - \exp\left(x_i^T\beta + \beta_0\right)\right) $$

**Multinomial**

In multinomial regression, we minimize the full-rank objective $$ -\sum_{i=1}^n\left( \sum_{k=1}^{m-1} y_{ik}(x_i^T\beta_k + \beta_{0,k}) - \log\sum_{k=1}^{m-1} \exp\big(x_i^T\beta_k + \beta_{0,k}\big) \right) $$ with \(y_{ik}\) being the element in a \(n\) by \((m-1)\) matrix, where \(m\) is the number of classes in the response.

There are multiple ways of specifying the `lambda`

sequence
in `SLOPE()`

. It is, first of all, possible to select the sequence manually by
using a non-increasing
numeric vector, possible of length one, as argument instead of a character.
If all `lambda`

are the same value, this will
lead to the ordinary lasso penalty. The greater the differences are between
consecutive values along the sequence, the more clustering behavior
will the model exhibit. Note, also, that the scale of the \(\lambda\)
vector makes no difference if `alpha = NULL`

, since `alpha`

will be
selected automatically to ensure that the model is completely sparse at the
beginning and almost unregularized at the end. If, however, both
`alpha`

and `lambda`

are manually specified, both of the scales will
matter.

Instead of choosing the sequence manually, one of the following automatically generated sequences may be chosen.

**BH (Benjamini--Hochberg)**

If `lambda = "bh"`

, the sequence used is that referred to
as \(\lambda^{(\mathrm{BH})}\) by Bogdan et al, which sets
\(\lambda\) according to
$$
\lambda_i = \Phi^{-1}(1 - iq/(2p)),
$$
for \(i=1,\dots,p\), where \(\Phi^{-1}\) is the quantile
function for the standard normal distribution and \(q\) is a parameter
that can be set by the user in the call to `SLOPE()`

.

**Gaussian**

This penalty sequence is related to BH, such that $$ \lambda_i = \lambda^{(\mathrm{BH})}_i \sqrt{1 + w(i-1)\cdot \mathrm{cumsum}(\lambda^2)_i}, $$ for \(i=1,\dots,p\), where \(w(k) = 1/(n-k-1)\). We let \(\lambda_1 = \lambda^{(\mathrm{BH})}_1\) and adjust the sequence to make sure that it's non-increasing. Note that if \(p\) is large relative to \(n\), this option will result in a constant sequence, which is usually not what you would want.

**OSCAR**

This sequence comes from Bondell and Reich and is a linearly non-increasing sequence such that $$ \lambda_i = q(p - i) + 1. $$ for \(i = 1,\dots,p\).

There are currently two solvers available for SLOPE: FISTA (Beck and
Teboulle 2009) and ADMM (Boyd et al. 2008). FISTA is available for
families but ADMM is currently only available for `family = "gaussian"`

.

`SLOPE()`

solves the convex minimization problem
$$
f(\beta) + \alpha \sum_{i=j}^p \lambda_j |\beta|_{(j)},
$$
where \(f(\beta)\) is a smooth and convex function and
the second part is the sorted L1-norm.
In ordinary least-squares regression,
\(f(\beta)\) is simply the squared norm of the least-squares residuals.
See section **Families** for specifics regarding the various types of
\(f(\beta)\) (model families) that are allowed in `SLOPE()`

.

By default, `SLOPE()`

fits a path of models, each corresponding to
a separate regularization sequence, starting from
the null (intercept-only) model to an almost completely unregularized
model. These regularization sequences are parameterized using
\(\lambda\) and \(\alpha\), with only \(\alpha\) varying along the
path. The length of the path can be manually, but will terminate
prematurely depending on
arguments `tol_dev_change`

, `tol_dev_ratio`

, and `max_variables`

.
This means that unless these arguments are modified, the path is not
guaranteed to be of length `path_length`

.

Bogdan, M., van den Berg, E., Sabatti, C., Su, W., & Cand<U+00E8>s, E. J. (2015). SLOPE -- adaptive variable selection via convex optimization. The Annals of Applied Statistics, 9(3), 1103<U+2013>1140. https://doi.org/10/gfgwzt

Bondell, H. D., & Reich, B. J. (2008). Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR. Biometrics, 64(1), 115<U+2013>123. JSTOR. https://doi.org/10.1111/j.1541-0420.2007.00843.x

Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2010). Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends<U+00AE> in Machine Learning, 3(1), 1<U+2013>122. https://doi.org/10.1561/2200000016

Beck, A., & Teboulle, M. (2009). A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences, 2(1), 183<U+2013>202. https://doi.org/10.1137/080716542

`plot.SLOPE()`

, `plotDiagnostics()`

, `score()`

, `predict.SLOPE()`

,
`trainSLOPE()`

, `coef.SLOPE()`

, `print.SLOPE()`

, `print.SLOPE()`

,
`deviance.SLOPE()`

# NOT RUN { # Gaussian response, default lambda sequence fit <- SLOPE(bodyfat$x, bodyfat$y) # Poisson response, OSCAR-type lambda sequence fit <- SLOPE(abalone$x, abalone$y, family = "poisson", lambda = "oscar", q = 0.4) # Multinomial response, custom alpha and lambda m <- length(unique(wine$y)) - 1 p <- ncol(wine$x) alpha <- 0.005 lambda <- exp(seq(log(2), log(1.8), length.out = p*m)) fit <- SLOPE(wine$x, wine$y, family = "multinomial", lambda = lambda, alpha = alpha) # }