atos: Adaptive three operator splitting (ATOS).

Description

Function for fitting adaptive three operator splitting (ATOS) with general convex penalties. Supports both linear and logistic regression, both with dense and sparse matrix implementations.

Usage

atos(
  X,
  y,
  type = "linear",
  prox_1,
  prox_2,
  pen_prox_1 = 0.5,
  pen_prox_2 = 0.5,
  max_iter = 5000,
  backtracking = 0.7,
  max_iter_backtracking = 100,
  tol = 1e-05,
  prox_1_opts = NULL,
  prox_2_opts = NULL,
  standardise = "l2",
  intercept = TRUE,
  x0 = NULL,
  u = NULL,
  verbose = FALSE
)

Value

An object of class "atos" containing:

beta: The fitted values from the regression. Taken to be the more stable fit between x and u, which is usually the former.
x: The solution to the original problem (see Pedregosa and Gidel (2018)).
u: The solution to the dual problem (see Pedregosa and Gidel (2018)).
z: The updated values from applying the first proximal operator (see Pedregosa and Gidel (2018)).
type: Indicates which type of regression was performed.
success: Logical flag indicating whether ATOS converged, according to tol.
num_it: Number of iterations performed. If convergence is not reached, this will be max_iter.
certificate: Final value of convergence criteria.
intercept: Logical flag indicating whether an intercept was fit.

Arguments

X

Input matrix of dimensions $n \times p$. Can be a sparse matrix (using class "sparseMatrix" from the Matrix package)

y

Output vector of dimension $n$. For type="linear" needs to be continuous and for type="logistic" needs to be a binary variable.

type

The type of regression to perform. Supported values are: "linear" and "logistic".

prox_1

The proximal operator for the first function, $h(x)$.

prox_2

The proximal operator for the second function, $g(x)$.

pen_prox_1

The penalty for the first proximal operator. For the lasso, this would be the sparsity parameter, $\lambda$. If operator does not include a penalty, set to 1.

pen_prox_2

The penalty for the second proximal operator.

max_iter

Maximum number of ATOS iterations to perform.

backtracking

The backtracking parameter, $\tau$, as defined in Pedregosa and Gidel (2018).

max_iter_backtracking

Maximum number of backtracking line search iterations to perform per global iteration.

tol

Convergence tolerance for the stopping criteria.

prox_1_opts

Optional argument for first proximal operator. For the group lasso, this would be the group IDs. Note: this must be inserted as a list.

prox_2_opts

Optional argument for second proximal operator.

standardise

Type of standardisation to perform on X:

"l2" standardises the input data to have $\ell_2$ norms of one.
"l1" standardises the input data to have $\ell_1$ norms of one.
"sd" standardises the input data to have standard deviation of one.
"none" no standardisation applied.

intercept

Logical flag for whether to fit an intercept.

x0

Optional initial vector for $x_0$.

u

Optional initial vector for $u$.

verbose

Logical flag for whether to print fitting information.

Details

atos() solves convex minimization problems of the form $$ f(x) + g(x) + h(x), $$ where $f$ is convex and differentiable with $L_f$-Lipschitz gradient, and $g$ and $h$ are both convex. The algorithm is not symmetrical, but usually the difference between variations are only small numerical values, which are filtered out. However, both variations should be checked regardless, by looking at x and u. An example for the sparse-group lasso (SGL) is given.

References

Pedregosa, F., Gidel, G. (2018). Adaptive Three Operator Splitting, https://proceedings.mlr.press/v80/pedregosa18a.html