Computes the complimentary CDF for the false discovery proportion, V_m/R_m via asymptotic approximation. Included here mainly for pedagogic purposes.
cCDF.VoR(u, effect.size, n.sample, r.1, alpha, delta, groups = 2, N.tests,
type = c("paired", "balanced", "unbalanced"), grpj.per.grp1 = NULL,
FDP.control.method = "BHFDR", distopt,
control=list(tol=1e-08,max.iter=c(1000,20),sim.level=2,low.power.stop=TRUE,
FDP.meth.thresh=FDP.cntl.mth.thrsh.def,verb=FALSE))
An object of class cdf
which contains components
The call which produced the result
A data frame with columns u
and cCDF.VoR
A sorted vector of values on the interval [0, 1] for which the cCDF of T_m/M_m should be computed.
The effect size (mean over standard deviation) for test statistics having non-zero means. Assumed to be a constant (in magnitude) over non-zero mean test statistics.
The number of experimental replicates. Required for calculation of power
The proportion of simultaneous tests that are non-centrally located
The false discovery rate (in the BH case) or the upper bound on the probability that the FDP exceeds delta (Romano case)
If the "FDP.control.method" is set to 'Romano' or 'BHFDX', then the user can set the exceedance thresh-hold for the FDP tail probability control \(P\{ FDP > \delta \} < \alpha\). The default value is \(\alpha\).
The number of experimental groups to compare. Must be integral and >=1. The default value is 2.
The number of simultaneous hypothesis tests.
A character string specifying, in the groups=2 case, whether the test is 'paired', 'balanced', or 'unbalanced' and in the case when groups >=3, whether the test is 'balanced' or 'unbalanced'. The default in all cases is 'balanced'. Left unspecified in the one sample (groups=1) case.
Required when type
="unbalanced", specifies the group 0 to
group 1 ratio in the two group case, and in the case of 3 or more
groups, the group j to group 1 ratio, where group 1 is the group
with the largest effect under the alternative hypothesis.
A character string specifying how the false discovery proportion (FDP) is to be
controlled. You may specify the whole word or any shortened uniquely
identifying truncation.
"BHFDR": the usual BH-FDR
"BHFDX": use asymptotic approximation to the distribution of the FDP
to find a smaller FDR which guarantees probability less
than alpha that the FDP exceeds alpha.
"Romano": use Romano's method which guarantees probability less than
alpha that the FDP exceeds alpha.
"Auto": in 'FixedPoint' mode, the program will use its own
wisdom to determine which choice above to make. The
order of conservatism is Romano > BHFDX > BHFDR, but
BHFDR offers only expected control while the other two
guarantee bounds on the excedance probabilty. If the
distribution of the FDP is nearly degenerate, then BHFDR
is the best option. Otherwise, if it can be reliably used,
BHFDX would be the best choice. The 'effective' denominator,
gamma*N.tests, in the CLT determines when the approximation
is good enough and the asymptotic standard error of the FDP
determines when the distribution is dispersed enough to matter.
Use "Auto" to run through these checks and determine the best.
A return argument, 'Auto', displays the choice made. See
output components and details.
"both": in 'simulation' mode, compute statistics R and T under BHFDX
and Romano (in addition to BHFDR). Corresponding
statistics are denoted R.st, T.st corresponding to BHFDX
control of the FDP, and R.R and T.R corresponding to
Romano control of the FDP. If sim.level is set to 2,
(default) the statistics R.st.ht and T.st.ht, which are
the number rejected and number true positives under BHFDX
where r_0 = 1-r_1, gamma, and alpha.star have been estimated
from the P-value data and then alpha.star computed from
these.
Test statistic distribution in among null and alternatively distributed sub-populations. distopt=0 gives normal (2 groups), distop=1 gives t- (2 groups) and distopt=2 gives F- (2+ groups)
Optionally, a list with components with the following
components:
'tol' is a convergence criterion used in iterative
methods which is set to 1e-8 by default.
'max.iter' is an iteration limit, set to 20 for the iterated
function limit and 1000 for all others by default.
'sim.level' sim level 2 (default) stipulates, when FDP.control.method
is set to "BHFDX", or "both", R.st.ht and T.st.ht are
computed in addition to R.st and T.st (see above).
'low.power.stop' in simulation option, will result in an error message
if the power computed via FixedPoint method is too low, which
result in no solution for the BHFDX option. Default setting is TRUE.
Set to FALSE to over-ride this behavior.
'FDP.meth.thresh' fine-tunes the 'Auto' voodoo (see above). Leave
this alone.
'verb' vebosity level.
Grant Izmirlian <izmirlian at nih dot gov>
Izmirlian G. (2020) Strong consistency and asymptotic normality for quantities related to the Benjamini-Hochberg false discovery rate procedure. Statistics and Probability Letters; 108713, <doi:10.1016/j.spl.2020.108713>.
Izmirlian G. (2017) Average Power and \(\lambda\)-power in Multiple Testing Scenarios when the Benjamini-Hochberg False Discovery Rate Procedure is Used. <arXiv:1801.03989>
Jung S-H. (2005) Sample size for FDR-control in microarray data analysis. Bioinformatics; 21:3097-3104.
Kluger D. M., Owen A. B. (2023) A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model. Bernoulli; xx:xxx-xxx.
Liu P. and Hwang J-T. G. (2007) Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics; 23:739-746.
Lehmann E. L., Romano J. P.. Generalizations of the familywise error rate. Ann. Stat.. 2005;33(3):1138-1154.
Romano Joseph P., Shaikh Azeem M.. Stepup procedures for control of generalizations of the familywise error rate. Ann. Stat.. 2006;34(4):1850-1873.
cCDF.Rom
cCDF.ToM
pwrFDR
library(pwrFDR)
u <- seq(from=0,to=1,len=100000)
rslt <- cCDF.VoR(u=u, effect.size=0.9, n.sample=70, r.1=0.05, alpha=0.15, N.tests=1000,
FDP.control.method="Auto")
## plot the result
with(rslt$cCDF.VoR, plot(u, cCDF.VoR, type="s"))
## compute the mean and median as a check
DX <- function(x)c(x[1], diff(x))
.mean. <- with(rslt$cCDF.VoR, sum(cCDF.VoR*DX(u)))
.median. <- with(rslt$cCDF.VoR, u[max(which(cCDF.VoR>0.5))])
Run the code above in your browser using DataLab