tailRankPower: Power of the tail-rank test

Description

Compute the significance level and the power of a tail-rank test.

Usage

tailRankPower(G, N1, N2, psi, phi, conf = 0.95,
              model=c("bb", "betabinom", "binomial"))
tailRankCutoff(G, N1, N2, psi, conf,
               model=c("bb", "betabinom", "binomial"),
               method=c('approx', 'exact'))

Value

tailRankCutoff returns an integer that is the maximum expected value of the tail rank statistic under the null hypothesis.

tailRankPower returns a real numbe between 0 and 1 that is the power of the tail-rank test to detect a marker with true sensitivity equal to $phi$.

Arguments

G: An integer; the number of genes being assessed as potnetial biomarkers. Statistically, the number of hypotheses being tested.
N1: An integer; the number of "train" or "healthy" samples used.
N2: An integer; the number of "test" or "cancer" samples used.
psi: A real number between 0 and 1; the desired specificity of the test.
phi: A real number between 0 and 1; the sensitivity that one would like to be able to detect, conditional on the specificity.
conf: A real number between 0 and 1; the confidence level of the results. Can be obtained by subtracting the family-wise Type I error from 1.
model: A character string that determines whether significance and power are computed based on a binomial or a beta-binomial (bb) model.
method: A character string; either "exact" or "approx". The deafult is to use a Bonferroni approximation.

Author

Kevin R. Coombes <krc@silicovore.com>

Details

A power estimate for the tail-rank test can be obtained as follows. First, let X ~ Binom(N,p) denote a binomial random variable. Under the null hypotheis that cancer is not different from normal, we let $p = 1 - \psi$ be the expected proportion of successes in a test of whether the value exceeds the psi-th quantile. Now let $$\alpha = P(X > x,| N, p)$$ be one such binomial measurement. When we make $G$ independent binomial measurements, we take $$conf = P(all\ G\ of\ the\ X's \le x | N, p).$$ (In our paper on the tail-rank statistic, we write everything in terms of $\gamma = 1 - conf$.) Then we have $$conf = P(X \le x | N, p)^G = (1 - alpha)^G.$$ Using a Bonferroni-like approximation, we can take $$conf ~= 1 - \alpha*G.$$ Solving for $\alpha$, we find that $$\alpha ~= (1-conf)/G.$$ So, the cutoff that ensures that in multiple experiments, each looking at $G$ genes in $N$ samples, we have confidence level $conf$ (or significance level $\gamma = 1 - conf$) of no false positives is computed by the function tailRankCutoff.

The final point to note is that the quantiles are also defined in terms of $q = 1 - \alpha$, so there are lots of disfiguring "1's" in the implementation.

Now we set $M$ to be the significance cutoff using the procedure detailed above. A gene with sensitivity $\phi$ gets detected if the observed number of cases above the threshold is greater than or equal to $M$. The tailRankPower function implements formula (1.3) of our paper on the tail-rank test.

Examples

Run this code

psi.0 <- 0.99
confide <- rev(c(0.8, 0.95, 0.99))
nh <- 20
ng <- c(100, 1000, 10000, 100000)
ns <- c(10, 20, 50, 100, 250, 500)
formal.cut <- array(0, c(length(ns), length(ng), length(confide)))
for (i in 1:length(ng)) {
  for (j in 1:length(ns)) {
    formal.cut[j, i, ] <- tailRankCutoff(ng[i], nh, ns[j], psi.0, confide)
  }
}
dimnames(formal.cut) <- list(ns, ng, confide)
formal.cut

phi <- seq(0.1, 0.7, by=0.1)
N <- c(10, 20, 50, 100, 250, 500)
pows <- matrix(0, ncol=length(phi), nrow=length(N))
for (ph in 1:length(phi)) {
  pows[, ph] <-  tailRankPower(10000, nh, N, 0.95, phi[ph], 0.9)
}
pows <- data.frame(pows)
dimnames(pows) <- list(as.character(N), as.character(round(100*phi)))
pows

Run the code above in your browser using DataLab