Function to fit a pathwise solution of sparse-group SLOPE (SGS) models using k-fold cross-validation. Supports both linear and logistic regression, both with dense and sparse matrix implementations.
fit_sgs_cv(
X,
y,
groups,
type = "linear",
lambda = "path",
path_length = 20,
min_frac = 0.05,
alpha = 0.95,
vFDR = 0.1,
gFDR = 0.1,
pen_method = 1,
nfolds = 10,
backtracking = 0.7,
max_iter = 5000,
max_iter_backtracking = 100,
tol = 1e-05,
standardise = "l2",
intercept = TRUE,
error_criteria = "mse",
screen = TRUE,
verbose = FALSE,
v_weights = NULL,
w_weights = NULL,
warm_start = NULL
)
A list containing:
A list of all the models fitted along the path.
The 1se chosen model, which is a "sgs"
object type.
The value of \(\lambda\) which generated the chosen model.
The path index for the chosen model.
A table containing fitting information about the models on the path.
Indicates which type of regression was performed.
Input matrix of dimensions \(n \times p\). Can be a sparse matrix (using class "sparseMatrix"
from the Matrix
package).
Output vector of dimension \(n\). For type="linear"
should be continuous and for type="logistic"
should be a binary variable.
A grouping structure for the input data. Should take the form of a vector of group indices.
The type of regression to perform. Supported values are: "linear"
and "logistic"
.
The regularisation parameter. Defines the level of sparsity in the model. A higher value leads to sparser models:
"path"
computes a path of regularisation parameters of length "path_length"
. The path will begin just above the value at which the first predictor enters the model and will terminate at the value determined by "min_frac"
.
User-specified single value or sequence. Internal scaling is applied based on the type of standardisation. The returned "lambda"
value will be the original unscaled value(s).
The number of \(\lambda\) values to fit the model for. If "lambda"
is user-specified, this is ignored.
Smallest value of \(\lambda\) as a fraction of the maximum value. That is, the final \(\lambda\) will be "min_frac"
of the first \(\lambda\) value.
The value of \(\alpha\), which defines the convex balance between SLOPE and gSLOPE. Must be between 0 and 1. Recommended value is 0.95.
Defines the desired variable false discovery rate (FDR) level, which determines the shape of the variable penalties. Must be between 0 and 1.
Defines the desired group false discovery rate (FDR) level, which determines the shape of the group penalties. Must be between 0 and 1.
The type of penalty sequences to use (see Feser and Evangelou (2023)):
"1"
uses the vMean SGS and gMean gSLOPE sequences.
"2"
uses the vMax SGS and gMean gSLOPE sequences.
"3"
uses the BH SLOPE and gMean gSLOPE sequences, also known as SGS Original.
The number of folds to use in cross-validation.
The backtracking parameter, \(\tau\), as defined in Pedregosa and Gidel (2018).
Maximum number of ATOS iterations to perform.
Maximum number of backtracking line search iterations to perform per global iteration.
Convergence tolerance for the stopping criteria.
Type of standardisation to perform on X
:
"l2"
standardises the input data to have \(\ell_2\) norms of one.
"l1"
standardises the input data to have \(\ell_1\) norms of one.
"sd"
standardises the input data to have standard deviation of one.
"none"
no standardisation applied.
Logical flag for whether to fit an intercept.
The criteria used to discriminate between models along the path. Supported values are: "mse"
(mean squared error) and "mae"
(mean absolute error).
Logical flag for whether to apply screening rules (see Feser and Evangelou (2024)). Screening discards irrelevant groups before fitting, greatly improving speed.
Logical flag for whether to print fitting information.
Optional vector for the variable penalty weights. Overrides the penalties from pen_method if specified. When entering custom weights, these are multiplied internally by \(\lambda\) and \(\alpha\). To void this behaviour, set \(\lambda = 2\) and \(\alpha = 0.5\)
Optional vector for the group penalty weights. Overrides the penalties from pen_method if specified. When entering custom weights, these are multiplied internally by \(\lambda\) and \(1-\alpha\). To void this behaviour, set \(\lambda = 2\) and \(\alpha = 0.5\)
Optional list for implementing warm starts. These values are used as initial values in the fitting algorithm. Need to supply "x"
and "u"
in the form "list(warm_x, warm_u)"
. Not recommended for use with a path or CV fit as start from the null model by design.
Fits SGS models under a pathwise solution using adaptive three operator splitting (ATOS), picking the 1se model as optimum. Warm starts are implemented.
Feser, F., Evangelou, M. (2023). Sparse-group SLOPE: adaptive bi-level selection with FDR-control, https://arxiv.org/abs/2305.09467
Feser, F., Evangelou, M. (2024). Strong screening rules for group-based SLOPE models, https://arxiv.org/abs/2405.15357
fit_sgs()
Other model-selection:
as_sgs()
,
fit_goscar_cv()
,
fit_gslope_cv()
,
fit_sgo_cv()
,
scaled_sgs()
Other SGS-methods:
as_sgs()
,
coef.sgs()
,
fit_sgo()
,
fit_sgo_cv()
,
fit_sgs()
,
plot.sgs()
,
predict.sgs()
,
print.sgs()
,
scaled_sgs()
# specify a grouping structure
groups = c(1,1,1,2,2,3,3,3,4,4)
# generate data
data = gen_toy_data(p=10, n=5, groups = groups, seed_id=3,group_sparsity=1)
# run SGS with cross-validation
cv_model = fit_sgs_cv(X = data$X, y = data$y, groups=groups, type = "linear",
path_length = 5, nfolds=5, alpha = 0.95, vFDR = 0.1, gFDR = 0.1, min_frac = 0.05,
standardise="l2",intercept=TRUE,verbose=TRUE)
Run the code above in your browser using DataLab