StabilityScore: Stability score

Description

Computes the stability score from selection proportions of models with a given parameter controlling the sparsity and for different thresholds in selection proportions. The score measures how unlikely it is that the selection procedure is uniform (i.e. uninformative) for a given combination of parameters.

Usage

StabilityScore(
  selprop,
  pi_list = seq(0.6, 0.9, by = 0.01),
  K,
  n_cat = 3,
  group = NULL
)

Value

A vector of stability scores obtained with the different thresholds in selection proportions.

Arguments

selprop: array of selection proportions.
pi_list: vector of thresholds in selection proportions. If n_cat=NULL or n_cat=2, these values must be >0 and <1. If n_cat=3, these values must be >0.5 and <1.
K: number of resampling iterations.
n_cat: computation options for the stability score. Default is NULL to use the score based on a z test. Other possible values are 2 or 3 to use the score based on the negative log-likelihood.
group: vector encoding the grouping structure among predictors. This argument indicates the number of variables in each group and only needs to be provided for group (but not sparse group) penalisation.

Details

The stability score is derived from the likelihood under the assumption of uniform (uninformative) selection.

We classify the features into three categories: the stably selected ones (that have selection proportions \(\ge \pi\)), the stably excluded ones (selection proportion \(\le 1-\pi\)), and the unstable ones (selection proportions between \(1-\pi\) and \(\pi\)).

Under the hypothesis of equiprobability of selection (instability), the likelihood of observing stably selected, stably excluded and unstable features can be expressed as:

\(L_{\lambda, \pi} = \prod_{j=1}^N [ ( 1 - F( K \pi - 1 ) )^{1_{H_{\lambda} (j) \ge K \pi}} \times ( F( K \pi - 1 ) - F( K ( 1 - \pi ) )^{1_{ (1-\pi) K < H_{\lambda} (j) < K \pi }} \times F( K ( 1 - \pi ) )^{1_{ H_{\lambda} (j) \le K (1-\pi) }} ]\)

where \(H_{\lambda} (j)\) is the selection count of feature \(j\) and \(F(x)\) is the cumulative probability function of the binomial distribution with parameters \(K\) and the average proportion of selected features over resampling iterations.

The stability score is computed as the minus log-transformed likelihood under the assumption of equiprobability of selection:

\(S_{\lambda, \pi} = -log(L_{\lambda, \pi})\)

The stability score increases with stability.

Alternatively, the stability score can be computed by considering only two sets of features: stably selected (selection proportions \(\ge \pi\)) or not (selection proportions \(< \pi\)). This can be done using n_cat=2.

References

ourstabilityselectionsharp

Examples

Run this code

# Simulating set of selection proportions
set.seed(1)
selprop <- round(runif(n = 20), digits = 2)

# Computing stability scores for different thresholds
score <- StabilityScore(selprop, pi_list = c(0.6, 0.7, 0.8), K = 100)

Run the code above in your browser using DataLab