fit_sgo: Fit an SGO model.

Description

Sparse-group OSCAR (SGO) main fitting function. Supports both linear and logistic regression, both with dense and sparse matrix implementations.

Usage

fit_sgo(
  X,
  y,
  groups,
  type = "linear",
  lambda = "path",
  path_length = 20,
  min_frac = 0.05,
  alpha = 0.95,
  max_iter = 5000,
  backtracking = 0.7,
  max_iter_backtracking = 100,
  tol = 1e-05,
  standardise = "l2",
  intercept = TRUE,
  screen = TRUE,
  verbose = FALSE,
  w_weights = NULL,
  v_weights = NULL,
  warm_start = NULL
)

Value

A list containing:

beta: The fitted values from the regression. Taken to be the more stable fit between x and z, which is usually the former. A filter is applied to remove very small values, where ATOS has not been able to shrink exactly to zero. Check this against x and z.
group_effects: The group values from the regression. Taken by applying the $\ell_2$ norm within each group on beta.
selected_var: A list containing the indicies of the active/selected variables for each "lambda" value. Index 1 corresponds to the first column in X.
selected_grp: A list containing the indicies of the active/selected groups for each "lambda" value. Index 1 corresponds to the first group in the groups vector. You can see the group order by running unique(groups).
num_it: Number of iterations performed. If convergence is not reached, this will be max_iter.
success: Logical flag indicating whether ATOS converged, according to tol.
certificate: Final value of convergence criteria.
x: The solution to the original problem (see Pedregosa and Gidel (2018)).
z: The updated values from applying the first proximal operator (see Pedregosa and Gidel (2018)).
u: The solution to the dual problem (see Pedregosa and Gidel (2018)).
screen_set_var: List of variables that were kept after screening step for each "lambda" value. (corresponds to $\mathcal{S}_v$ in Feser and Evangelou (2024)).
screen_set_grp: List of groups that were kept after screening step for each "lambda" value. (corresponds to $\mathcal{S}_g$ in Feser and Evangelou (2024)).
epsilon_set_var: List of variables that were used for fitting after screening for each "lambda" value. (corresponds to $\mathcal{E}_v$ in Feser and Evangelou (2024)).
epsilon_set_grp: List of groups that were used for fitting after screening for each "lambda" value. (corresponds to $\mathcal{E}_g$ in Feser and Evangelou (2024)).
kkt_violations_var: List of variables that violated the KKT conditions each "lambda" value. (corresponds to $\mathcal{K}_v$ in Feser and Evangelou (2024)).
kkt_violations_grp: List of groups that violated the KKT conditions each "lambda" value. (corresponds to $\mathcal{K}_g$ in Feser and Evangelou (2024)).
pen_slope: Vector of the variable penalty sequence.
pen_gslope: Vector of the group penalty sequence.
screen: Logical flag indicating whether screening was performed.
type: Indicates which type of regression was performed.
intercept: Logical flag indicating whether an intercept was fit.
lambda: Value(s) of $\lambda$ used to fit the model.

Arguments

X

Input matrix of dimensions $n \times p$. Can be a sparse matrix (using class "sparseMatrix" from the Matrix package).

y

Output vector of dimension $n$. For type="linear" should be continuous and for type="logistic" should be a binary variable.

groups

A grouping structure for the input data. Should take the form of a vector of group indices.

type

The type of regression to perform. Supported values are: "linear" and "logistic".

lambda

The regularisation parameter. Defines the level of sparsity in the model. A higher value leads to sparser models:

"path" computes a path of regularisation parameters of length "path_length". The path will begin just above the value at which the first predictor enters the model and will terminate at the value determined by "min_frac".
User-specified single value or sequence. Internal scaling is applied based on the type of standardisation. The returned "lambda" value will be the original unscaled value(s).

path_length

The number of $\lambda$ values to fit the model for. If "lambda" is user-specified, this is ignored.

min_frac

Smallest value of $\lambda$ as a fraction of the maximum value. That is, the final $\lambda$ will be "min_frac" of the first $\lambda$ value.

alpha

The value of $\alpha$, which defines the convex balance between OSCAR and gOSCAR. Must be between 0 and 1. Recommended value is 0.95.

max_iter

Maximum number of ATOS iterations to perform.

backtracking

The backtracking parameter, $\tau$, as defined in Pedregosa and Gidel (2018).

max_iter_backtracking

Maximum number of backtracking line search iterations to perform per global iteration.

tol

Convergence tolerance for the stopping criteria.

standardise

Type of standardisation to perform on X:

"l2" standardises the input data to have $\ell_2$ norms of one. When using this "lambda" is scaled internally by $1/\sqrt{n}$.
"l1" standardises the input data to have $\ell_1$ norms of one. When using this "lambda" is scaled internally by $1/n$.
"sd" standardises the input data to have standard deviation of one.
"noBaone" no standardisation applied.

intercept

Logical flag for whether to fit an intercept.

screen

Logical flag for whether to apply screening rules (see Feser and Evangelou (2024)). Screening discards irrelevant groups before fitting, greatly improving speed.

verbose

Logical flag for whether to print fitting information.

w_weights

Optional vector for the group penalty weights. Overrides the OSCAR penalties when specified. When entering custom weights, these are multiplied internally by $\lambda$ and $1-\alpha$. To void this behaviour, set $\lambda = 2$ and $\alpha = 0.5$.

v_weights

Optional vector for the variable penalty weights. Overrides the OSCAR penalties when specified. When entering custom weights, these are multiplied internally by $\lambda$ and $\alpha$. To void this behaviour, set $\lambda = 2$ and $\alpha = 0.5$.

warm_start

Optional list for implementing warm starts. These values are used as initial values in the fitting algorithm. Need to supply "x" and "u" in the form "list(warm_x, warm_u)". Not recommended for use with a path or CV fit as start from the null model by design.

Details

fit_sgo() fits an SGO model (Feser and Evangelou (2024)) using adaptive three operator splitting (ATOS). SGO uses the same model set-up as for SGS, but with different weights (see Bao et al. (2020) and Feser and Evangelou (2024)). The penalties are given by (for a group $g$ and variable $i$, with $p$ variables and $m$ groups): $$ v_i = \sigma_1 + \sigma_2(p-i), \; w_g = \sigma_1 + \sigma_3(m-g), $$ where $$ \sigma_1 = d_i\|X^\intercal y\|_\infty, \; \sigma_2 = \sigma_1/p, \; \sigma_3 = \sigma_1/m, \; d_i = i \times \exp{(-2)}. $$

References

Bao, R., Gu B., Huang, H. (2020). Fast OSCAR and OWL Regression via Safe Screening Rules, https://proceedings.mlr.press/v119/bao20b

Feser, F., Evangelou, M. (2023). Sparse-group SLOPE: adaptive bi-level selection with FDR-control, https://arxiv.org/abs/2305.09467

Feser, F., Evangelou, M. (2024). Strong screening rules for group-based SLOPE models, https://arxiv.org/abs/2405.15357

Pedregosa, F., Gidel, G. (2018). Adaptive Three Operator Splitting, https://proceedings.mlr.press/v80/pedregosa18a.html

Examples

Run this code

# specify a grouping structure
groups = c(1,1,1,2,2,3,3,3,4,4)
# generate data
data =  gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1)
# run SGO
model = fit_sgo(X = data$X, y = data$y, groups = groups, type="linear", path_length = 5, 
alpha=0.95, standardise = "l2", intercept = TRUE, verbose=FALSE)

Run the code above in your browser using DataLab