bs: Build multiple networks and select the best one from a multi-omics data set

Description

bs() wraps the two main functions of the package in a single one: coglasso(), to build multiple multi-omics networks, and select_coglasso() to select the best one according to the chosen criterion.

Usage

bs(
  data,
  p = NULL,
  pX = lifecycle::deprecated(),
  lambda_w = NULL,
  lambda_b = NULL,
  c = NULL,
  nlambda_w = NULL,
  nlambda_b = NULL,
  nc = NULL,
  lambda_w_max = NULL,
  lambda_b_max = NULL,
  c_max = NULL,
  lambda_w_min_ratio = NULL,
  lambda_b_min_ratio = NULL,
  c_min = NULL,
  icov_guess = NULL,
  cov_output = FALSE,
  lock_lambdas = FALSE,
  method = "xestars",
  stars_thresh = 0.1,
  stars_subsample_ratio = NULL,
  rep_num = 20,
  max_iter = 10,
  old_sampling = FALSE,
  ebic_gamma = 0.5,
  verbose = TRUE
)

Value

bs() returns an object of S3 class select_coglasso containing several elements. The most important is probably sel_adj, the adjacency matrix of the selected network. Some output elements depend on the chosen model selection method.

These elements are always returned, and they are the result of network estimation with coglasso():

loglik is a numerical vector containing the \(log\) likelihoods of all the estimated networks.
density is a numerical vector containing a measure of the density of all the estimated networks.
df is an integer vector containing the degrees of freedom of all the estimated networks.
convergence is a binary vector containing whether a network was successfully estimated for the given combination of hyperparameters or not.
path is a list containing the adjacency matrices of all the estimated networks.
icov is a list containing the inverse covariance matrices of all the estimated networks.
nexploded is the number of combinations of hyperparameters for which coglasso() failed to converge.
data is the input multi-omics data set.
hpars is the ordered table of all the combinations of hyperparameters given as input to bs(), with \(\alpha(\lambda_w+\lambda_b)\) being the key to sort rows.
lambda_w, lambda_b, and c are numerical vectors with, respectively, all the \(\lambda_w\), \(\lambda_b\), and \(c\) values bs() used.
p is the vector with the number of variables for each omic layer of the data set.
D is the number of omics layers in the data set.
cov optional, returned when cov_output is TRUE, is a list containing the variance-covariance matrices of all the estimated networks.

These elements are returned by all selection methods available:

sel_index_c, sel_index_lw and sel_index_lb are the indexes of the final selected parameters \(c\), \(\lambda_w\) and \(\lambda_b\) leading to the most stable sparse network.
sel_c, sel_lambda_w and sel_lambda_b are the final selected parameters \(c\), \(\lambda_w\) and \(\lambda_b\) leading to the most stable sparse network.
sel_adj is the adjacency matrix of the final selected network.
sel_density is the density of the final selected network.
sel_icov is the inverse covariance matrix of the final selected network.
sel_cov optional, given only when coglasso() was called with cov_output = TRUE. It is the covariance matrix associated with the final selected network.
call is the matched call.
method is the chosen model selection method.

These are the additional elements returned when choosing "xestars" or "xstars":

merge is the "merged" adjacency matrix, the average of all the adjacency matrices estimated across all the different subsamples for the selected combination of \(\lambda_w\), \(\lambda_b\), and \(c\) values in the last path explored before convergence. Each entry is a measure of how recurrent the corresponding edge is across the subsamples.
variability_lw, variability_lb and variability_c are numeric vectors of as many items as the number of \(\lambda_w\), \(\lambda_b\), and \(c\) values explored. Each item is the variability of the network estimated for the corresponding hyperparameter value, keeping the other two hyperparameters fixed to their selected value.
sel_variability is the variability of the final selected network.

These are the additional elements returned when choosing "ebic":

ebic_scores is a numerical vector containing the eBIC scores for all the hyperparameter combination.

Arguments

data: The input multi-omics data set. Rows should be samples, columns should be variables. Variables should be grouped by their assay (e.g. transcripts first, then metabolites). data is a required parameter.
p: A vector with with the number of variables for each omic layer of the data set (e.g. the number of transcripts, metabolites etc.), in the same order the layers have in the data set. If given a single number, coglasso() assumes that the total of data sets is two, and that the number given is the dimension of the first one.
pX: pX is no longer supported. Please use p.
lambda_w: A vector of values for the parameter \(\lambda_w\), the penalization parameter for the "within" interactions. Overrides nlambda_w.
lambda_b: A vector of values for the parameter \(\lambda_b\), the penalization parameter for the "between" interactions. Overrides nlambda_b.
c: A vector of values for the parameter \(c\), the weight given to collaboration. Overrides nc.
nlambda_w: The number of requested \(\lambda_w\) parameters to explore. A sequence of size nlambda_w of \(\lambda_w\) parameters will be generated. Defaults to 8. Ignored when lambda_w is set by the user.
nlambda_b: The number of requested \(\lambda_b\) parameters to explore. A sequence of size nlambda_b of \(\lambda_b\) parameters will be generated. Defaults to 8. Ignored when lambda_b is set by the user.
nc: The number of requested \(c\) parameters to explore. A sequence of size nc of \(c\) parameters will be generated. Defaults to 5. Ignored when c is set by the user.
lambda_w_max: The greatest generated \(\lambda_w\). By default it is computed with a data-driven approach. Ignored when lambda_w is set by the user.
lambda_b_max: The greatest generated \(\lambda_b\). By default it is computed with a data-driven approach. Ignored when lambda_b is set by the user.
c_max: The greatest \(c\) explored. Defaults to 100. Ignored when c is set by the user.
lambda_w_min_ratio: The ratio of the smallest generated \(\lambda_w\) over the greatest generated \(\lambda_w\). Defaults to 0.1. Ignored when lambda_w is set by the user.
lambda_b_min_ratio: The ratio of the smallest generated \(\lambda_b\) over the greatest generated \(\lambda_b\). Defaults to 0.1. Ignored when lambda_b is set by the user.
c_min: The the smallest \(c\) explored. Defaults to \(\frac{1}{c_{max}}\), so to 0.01 if c_max is not set by the user. Ignored when c is set by the user.
icov_guess: Use a predetermined inverse covariance matrix as an initial guess for the network estimation.
cov_output: Add the estimated variance-covariance matrix to the output.
lock_lambdas: Set \(\lambda_w = \lambda_b\). Force a single lambda parameter for both "within" and "between" interactions.
method: The model selection method to select the best combination of hyperparameters. The available options are "xstars", "xestars" and "eBIC". Defaults to "xestars".
stars_thresh: The threshold set for variability of the explored networks at each iteration of the algorithm. The \(\lambda_w\) or the \(\lambda_b\) associated to the most stable network before the threshold is overcome is selected.
stars_subsample_ratio: The proportion of samples in the multi-omics data set to be randomly subsampled to estimate the variability of the network under the given hyperparameters setting. Defaults to 80% when the number of samples is smaller than 144, otherwise it defaults to \(\frac{10}{n}\sqrt{n}\).
rep_num: The amount of subsamples of the multi-omics data set used to estimate the variability of the network under the given hyperparameters setting. Defaults to 20.
max_iter: The greatest number of times the algorithm is allowed to choose a new best \(\lambda_w\). Defaults to 10.
old_sampling: Perform the same subsampling xstars() would if set to TRUE. Makes a difference with bigger data sets, where computing a correlation matrix could take significantly longer. Defaults to FALSE.
ebic_gamma: The \(\gamma\) tuning parameter for eBIC selection, to set between 0 and 1. When set to 0 one has the standard BIC. Defaults to 0.5.
verbose: Print information regarding the network building and the network selection processes.

Details

When using bs(), first, coglasso() estimates multiple multi-omics networks with the algorithm collaborative graphical lasso, one for each combination of input values for the hyperparameters \(\lambda_w\), \(\lambda_b\) and \(c\). Then, select_coglasso() selects the best combination of hyperparameters given to coglasso() according to the selected model selection method. The three availble options that can be set for the argument method are "xstars", "xestars" and "ebic". For more information on these selection methods, visit the help page of select_coglasso().

Examples

Run this code

# Suggested usage: give the input data set, set the values for `p` and the 
# number of hyperparameters to explore (to choose how extensively to explore 
# the possible hyperparameters). Then, let the default behavior do the rest:

sel_mo_net <- bs(multi_omics_sd_micro, p = c(4, 2), nlambda_w = 3, 
                 nlambda_b = 3, nc = 3, verbose = FALSE)

Run the code above in your browser using DataLab