xestars: Efficient stability selection of the best `coglasso` network

Description

xestars() provides a more efficient and lighter implementation than xstars() to select the combination of hyperparameters given to coglasso() yielding the most stable, yet sparse network. Stability is computed upon network estimation from multiple subsamples of the multi-omics data set, allowing repetition. Subsamples are collected for a fixed amount of times (rep_num), and with a fixed proportion of the total number of samples (stars_subsample_ratio).

Usage

xestars(
  coglasso_obj,
  stars_thresh = 0.1,
  stars_subsample_ratio = NULL,
  rep_num = 20,
  max_iter = 10,
  old_sampling = FALSE,
  verbose = TRUE
)

Value

xestars() returns an object of S3 class select_coglasso

containing the results of the selection procedure, built upon the object of S3 class coglasso returned by coglasso().

... are the same elements returned by coglasso().
merge is the "merged" adjacency matrix, the average of all the adjacency matrices estimated across all the different subsamples for the selected combination of \(\lambda_w\), \(\lambda_b\), and \(c\) values in the last path explored before convergence. Each entry is a measure of how recurrent the corresponding edge is across the subsamples.
variability_lw, variability_lb and variability_c are numeric vectors of as many items as the number of \(\lambda_w\), \(\lambda_b\), and \(c\) values explored. Each item is the variability of the network estimated for the corresponding hyperparameter value, keeping the other two hyperparameters fixed to their selected value.
sel_index_c, sel_index_lw and sel_index_lb are the indexes of the final selected parameters \(c\), \(\lambda_w\) and \(\lambda_b\) leading to the most stable sparse network.
sel_c, sel_lambda_w and sel_lambda_b are the final selected parameters \(c\), \(\lambda_w\) and \(\lambda_b\) leading to the most stable sparse network.
sel_adj is the adjacency matrix of the final selected network.
sel_variability is the variability of the final selected network.
sel_density is the density of the final selected network.
sel_icov is the inverse covariance matrix of the final selected network.
sel_cov optional, given only when coglasso() was called with cov_output = TRUE. It is the covariance matrix associated with the final selected network.
call is the matched call.
method is the chosen model selection method. Here, it is "xestars".

Arguments

coglasso_obj: The object of S3 class coglasso returned by coglasso().
stars_thresh: The threshold set for variability of the explored networks at each iteration of the algorithm. The \(\lambda_w\) or the \(\lambda_b\) associated to the most stable network before the threshold is overcome is selected.
stars_subsample_ratio: The proportion of samples in the multi-omics data set to be randomly subsampled to estimate the variability of the network under the given hyperparameters setting. Defaults to 80% when the number of samples is smaller than 144, otherwise it defaults to \(\frac{10}{n}\sqrt{n}\).
rep_num: The amount of subsamples of the multi-omics data set used to estimate the variability of the network under the given hyperparameters setting. Defaults to 20.
max_iter: The greatest number of times the algorithm is allowed to choose a new best \(\lambda_w\). Defaults to 10.
old_sampling: Perform the same subsampling xstars() would if set to TRUE. Makes a difference with bigger data sets, where computing a correlation matrix could take significantly longer. Defaults to FALSE.
verbose: Print information regarding the progress of the selection procedure on the console.

Details

eXtended Efficient StARS (XEStARS) is a more efficient and memory-light version of XStARS, the adaptation for collaborative graphical regression of the method published by Liu, H. et al. (2010): Stability Approach to Regularization Selection (StARS). StARS was developed for network estimation regulated by a single penalty parameter, while collaborative graphical lasso needs to explore three different hyperparameters. These all have, to different degree, a direct influence on network sparsity, hence on stability. For every iteration, xstars() explores one of the three parameters (\(\lambda_w\), \(\lambda_b\), or \(c\)), keeping the other ones fixed at their previous selected estimate, using the normal, one-dimentional StARS approach, until finding the best combination of the three. What makes it more efficient than xstars() is the different way that the stability check is implemented in the two algorithms. In xstars() (and even in the original StARS), the stability check is performed, for example, for every \(\lambda_w\) value (or \(\lambda_b\), or \(c\)), until all values are explored, and then it when the algorithm selects the one yielding the most stable, yet sparse network, and only then switching to the selection of the following hyperparameter. In xestars(), the stability check becomes a stopping criterion. The moment that the stability threshold is passed, the value of the hyperparameter currently being selected is fixed, and the switch to the next one happens immediately, without exploring the whole landscape. This reduces sensibly the number of iterations before convergence to a final network.
The original XStARS computes a new subsampling for every time the algorithm switches from optimizing \(\lambda_w\), \(\lambda_b\), or \(c\). This does not allow to compare the hyperparameters on an equal ground, and can slow the selection down with bigger data set or a larger hyperparameter space. To allow a similar subsampling to xstars(), the old_sampling parameter has been implemented. If set to TRUE, the subsampling is similar to the one xstars() would perform. Otherwise, the subsampling is performed at the beginning of the algorithm once and for all its iterations.

Examples

Run this code

cg <- coglasso(multi_omics_sd_micro, p = c(4, 2), nlambda_w = 3, 
               nlambda_b = 3, nc = 3, verbose = FALSE)
# \donttest{
# Takes less than five seconds
sel_cg <- xestars(cg, verbose = FALSE)
# }

Run the code above in your browser using DataLab