SerialGraphical: Stability selection graphical model (internal)

Description

Runs stability selection graphical models with different combinations of parameters controlling the sparsity of the underlying selection algorithm (e.g. penalty parameter for regularised models) and thresholds in selection proportions. These two parameters are jointly calibrated by maximising the stability score of the model (possibly under a constraint on the expected number of falsely stably selected features). This function uses a serial implementation and requires the grid of parameters controlling the underlying algorithm as input (for internal use only).

Usage

SerialGraphical(
  xdata,
  pk = NULL,
  Lambda,
  lambda_other_blocks = 0.1,
  pi_list = seq(0.6, 0.9, by = 0.01),
  K = 100,
  tau = 0.5,
  seed = 1,
  n_cat = n_cat,
  implementation = PenalisedGraphical,
  start = "cold",
  scale = TRUE,
  resampling = "subsampling",
  cpss = FALSE,
  PFER_method = "MB",
  PFER_thr = Inf,
  FDP_thr = Inf,
  output_data = FALSE,
  verbose = TRUE,
  ...
)

Value

A list with:

S: a matrix of the best stability scores for different (sets of) parameters controlling the level of sparsity in the underlying algorithm.
Lambda: a matrix of parameters controlling the level of sparsity in the underlying algorithm.
Q: a matrix of the average number of selected features by the underlying algorithm with different parameters controlling the level of sparsity.
Q_s: a matrix of the calibrated number of stably selected features with different parameters controlling the level of sparsity.
P: a matrix of calibrated thresholds in selection proportions for different parameters controlling the level of sparsity in the underlying algorithm.
PFER: a matrix of upper-bounds in PFER of calibrated stability selection models with different parameters controlling the level of sparsity.
FDP: a matrix of upper-bounds in FDP of calibrated stability selection models with different parameters controlling the level of sparsity.
S_2d: a matrix of stability scores obtained with different combinations of parameters. Columns correspond to different thresholds in selection proportions.
PFER_2d: a matrix of upper-bounds in FDP obtained with different combinations of parameters. Columns correspond to different thresholds in selection proportions. Only returned if length(pk)=1.
FDP_2d: a matrix of upper-bounds in PFER obtained with different combinations of parameters. Columns correspond to different thresholds in selection proportions. Only returned if length(pk)=1.
selprop: an array of selection proportions. Rows and columns correspond to nodes in the graph. Indices along the third dimension correspond to different parameters controlling the level of sparsity in the underlying algorithm.
sign: a matrix of signs of Pearson's correlations estimated from xdata.
method: a list with type="graphical_model" and values used for arguments implementation, start, resampling, cpss and PFER_method.
params: a list with values used for arguments K, pi_list, tau, n_cat, pk, n (number of observations in xdata), PFER_thr, FDP_thr, seed, lambda_other_blocks, and Sequential_template.

The rows of S, Lambda, Q, Q_s, P,

PFER, FDP, S_2d, PFER_2d and FDP_2d, and indices along the third dimension of selprop are ordered in the same way and correspond to parameter values stored in Lambda. For multi-block inference, the columns of S, Lambda, Q,

Q_s, P, PFER and FDP, and indices along the third dimension of S_2d correspond to the different blocks.

Arguments

xdata: data matrix with observations as rows and variables as columns. For multi-block stability selection, the variables in data have to be ordered by group.
pk: optional vector encoding the grouping structure. Only used for multi-block stability selection where pk indicates the number of variables in each group. If pk=NULL, single-block stability selection is performed.
Lambda: matrix of parameters controlling the level of sparsity in the underlying feature selection algorithm specified in implementation. If implementation="glassoFast", Lambda contains penalty parameters.
lambda_other_blocks: optional vector of parameters controlling the level of sparsity in neighbour blocks for the multi-block procedure. To use jointly a specific set of parameters for each block, lambda_other_blocks must be set to NULL (not recommended). Only used for multi-block stability selection, i.e. if length(pk)>1.
pi_list: vector of thresholds in selection proportions. If n_cat=NULL or n_cat=2, these values must be >0 and <1. If n_cat=3, these values must be >0.5 and <1.
K: number of resampling iterations.
tau: subsample size. Only used if resampling="subsampling" and cpss=FALSE.
seed: value of the seed to initialise the random number generator and ensure reproducibility of the results (see set.seed).
n_cat: computation options for the stability score. Default is NULL to use the score based on a z test. Other possible values are 2 or 3 to use the score based on the negative log-likelihood.
implementation: function to use for graphical modelling. If implementation=PenalisedGraphical, the algorithm implemented in glassoFast is used for regularised estimation of a conditional independence graph. Alternatively, a user-defined function can be provided.
start: character string indicating if the algorithm should be initialised at the estimated (inverse) covariance with previous penalty parameters (start="warm") or not (start="cold"). Using start="warm" can speed-up the computations, but could lead to convergence issues (in particular with small Lambda_cardinal). Only used for implementation=PenalisedGraphical (see argument "start" in glassoFast).
scale: logical indicating if the correlation (scale=TRUE) or covariance (scale=FALSE) matrix should be used as input of glassoFast if implementation=PenalisedGraphical. Otherwise, this argument must be used in the function provided in implementation.
resampling: resampling approach. Possible values are: "subsampling" for sampling without replacement of a proportion tau of the observations, or "bootstrap" for sampling with replacement generating a resampled dataset with as many observations as in the full sample. Alternatively, this argument can be a function to use for resampling. This function must use arguments named data and tau and return the IDs of observations to be included in the resampled dataset.
cpss: logical indicating if complementary pair stability selection should be done. For this, the algorithm is applied on two non-overlapping subsets of half of the observations. A feature is considered as selected if it is selected for both subsamples. With this method, the data is split K/2 times (K models are fitted). Only used if PFER_method="MB".
PFER_method: method used to compute the upper-bound of the expected number of False Positives (or Per Family Error Rate, PFER). If PFER_method="MB", the method proposed by Meinshausen and Bühlmann (2010) is used. If PFER_method="SS", the method proposed by Shah and Samworth (2013) under the assumption of unimodality is used.
PFER_thr: threshold in PFER for constrained calibration by error control. If PFER_thr=Inf and FDP_thr=Inf, unconstrained calibration is used (the default).
FDP_thr: threshold in the expected proportion of falsely selected features (or False Discovery Proportion) for constrained calibration by error control. If PFER_thr=Inf and FDP_thr=Inf, unconstrained calibration is used (the default).
output_data: logical indicating if the input datasets xdata and ydata should be included in the output.
verbose: logical indicating if a loading bar and messages should be printed.
...: additional parameters passed to the functions provided in implementation or resampling.