tdc_sb: Standardized Band

Description

This function computes an upper prediction bound, derived from the standardized band, on the FDP in TDC's list of discoveries.

Usage

tdc_sb(
  thresholds,
  labels,
  alpha,
  gamma,
  c = 0.5,
  lambda = 0.5,
  n = length(labels),
  interpolate = TRUE
)
stband(
  thresholds,
  labels,
  alpha,
  gamma,
  c = 0.5,
  lambda = 0.5,
  n = length(labels),
  interpolate = TRUE
)

Value

An upper prediction bound on the FDP in TDC's list of discoveries. If thresholds is a vector, returns an upper prediction bound for each element of thresholds.

Arguments

thresholds: The rejection threshold of TDC. If given as a vector, an upper prediction bound is returned for each element.
labels: A vector of (ordered) labels. See details below.
alpha: The FDR threshold.
gamma: The confidence parameter of the bound. Typical values include gamma = 0.05 or gamma = 0.01.
c: Determines the ranks of the target score that are considered winning. Defaults to c = 0.5 for (single-decoy) TDC.
lambda: Determines the ranks of the target score that are considered losing. Defaults to lambda = 0.5 for (single-decoy) TDC.
n: The number of hypotheses. Defaults to the length of labels.
interpolate: A boolean indicating whether the bands should be interpolated. Offers a slight boost in performance at the cost of computing power. Defaults to TRUE.

Details

In (single-decoy) TDC, each hypothesis is associated to a winning score and a label (1 for a target win, -1 for a decoy win). This function assumes that the hypotheses are ordered in decreasing order of winning scores (with ties broken at random). The argument labels, therefore, must be ordered according to this rule.

This function also supports the extension of TDC that uses multiple decoys. In that setup, the target score is competed with multiple decoy scores and the rank of the target score after competition is used to determine whether the hypothesis is a target win (label = 1), decoy win (-1) or uncounted (0). The top c proportion of ranks are considered winning, the bottom 1-lambda losing, and all the rest uncounted.

The threshold of TDC is given by the formula: $$\max\{k : \frac{D_k + 1}{T_k \vee 1} \cdot \frac{c}{1-\lambda} \leq \alpha\}$$ where $T_k$ is the number of target wins among the top $k$ hypotheses, and $D_k$ is the number of decoy wins similarly.

The argument gamma sets a confidence level of 1-gamma. Since the standardized band requires pre-computed Monte Carlo quantiles, only certain values of gamma are available to use. Commonly used confidence levels, like 0.95 and 0.99, are available. We refer the reader to the README of this package for more details.

The argument alpha, used to compute the threshold of TDC, is also used in this function. It serves to compute an appropriate d_max for a non-trivial bound. In particular, if the user inputs a vector of thresholds, a bound is returned for each element of thresholds using the same d_max. For more details, see: https://arxiv.org/abs/2302.11837.

We recommend the use of interpolate = TRUE (default), as it generally results in a tighter bound. This comes at the cost of performance: the bound for each threshold is computed in O(n) time with interpolation and O(1) without.

References

Ebadi et al. (2022), Bounding the FDP in competition-based control of the FDR https://arxiv.org/abs/2302.11837.

Examples

Run this code

if (requireNamespace("fdpbandsdata", quietly = TRUE)) {
  set.seed(123)
  thresholds <- c(250, 500, 750, 1000)
  labels <- c(
    rep(1, 250),
    sample(c(1, -1), size = 250, replace = TRUE, prob = c(0.9, 0.1)),
    sample(c(1, -1), size = 250, replace = TRUE, prob = c(0.5, 0.5)),
    sample(c(1, -1), size = 250, replace = TRUE, prob = c(0.1, 0.9))
  )
  alpha <- 0.05
  gamma <- 0.05
  tdc_sb(thresholds, labels, alpha, gamma)
}

Run the code above in your browser using DataLab