estim.pi0: Estimation of the proportion of true null hypotheses

Description

From a proteomics viewpoint, this function estimates the global proportion of proteins (resp. of peptides) that are non differentially abundant from the tested protein list (resp. from the tested peptide list). This proportion is later used as a correcting factor to compute the adjusted p-values, that are in turn used to tune a threshold according to a desired false discovery rate. From a statistical viewpoint, this function allows estimating the proportion of true null hypotheses (pi0) from a vector of raw p-values following eight different estimation methods from the literature.

Usage

estim.pi0(p, pi0.method = "ALL", nbins = 20, pz = 0.05)

Arguments

Numeric vector of raw p-values. Raw p-values are assumed without missing values, and between 0 and 1.

pi0.method

Name of an estimation method for the proportion of true null hypotheses among "st.boot", "st.spline", "langaas", "jiang", "histo", "pounds", "abh" or "slim"

nbins

Number of bins. Parameter used for the "jiang" and "histo" methods. Default is 20.

pz

P-value threshold such as p-values below are associated to false null hypotheses. Used for the "slim" method. Wang, Tuominen and Tsai (2011) suggest to take a value between 0.01 and 0.1. Default is 0.05.

`Value`

pi0Numeric value of the estimated proportion of true null hypotheses from the selected method; Numeric vector if pi0.method="ALL".

`Details`

This function allows to estimate the proportion of true null hypotheses following different estimation methods :
ll{
"abh"	the least slope method proposed in Benjamini and Hochberg (2000).
	
"st.spline"	the smoother method described in Storey and Tibshirani (2003). 
	The qvalue function of R package qvalue with default tuning is used (Storey (2015)).
	
"st.boot"	the bootstrap method described in Storey et al. (2004).
	The qvalue function of R package qvalue with default tuning is used (Storey (2015)).
	
"langaas"	the method described in Langaas, Ferkingstad and Lindqvist (2005) using a convex
	decreasing density estimate for p-values. The convest function of R package limma
	with default tuning is used (Ritchie et al. (2015)).
	
"histo"	the histogram method described in Nettleton, Hwang, Caldo and Wise (2006).
	
"pounds"	the conservative estimate described in Pounds and Cheng (2006).
	
"jiang"	the average estimate method described in Jiang and Doerge (2008).
	
"slim"	the method of Wang, Tuominen and Tsai (2011) using a sliding linear model.
	The default tuning suggested by Wang, Tuominen and Tsai (2011) is used.
	Using their notations, lambda1 is fixed to 0.1, n to 10 and B to 100.
}

`References`

Y. Benjamini and Y. Hochberg. On the adaptive control of the false discovery rate in multiple testing with independent statistics. Journal of Educational and Behavioral Statistics, 25(1):60-83, 2000.

H. Jiang and R.W. Doerge. Estimating the proportion of true null hypotheses for multiple comparisons. Cancer informatics, 6:25, 2008.

M. Langaas, B.H. Lindqvist, and E. Ferkingstad. Estimating the proportion of true null hypotheses, with application to dna microarray data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(4):555-572, 2005.

D. Nettleton, J.T.G. Hwang, R.A. Caldo, and R.P. Wise. Estimating the number of true null hypotheses from a histogram of p values. Journal of Agricultural, Biological, and Environmental Statistics, 11(3):337-356, 2006.

S. Pounds and C. Cheng. Robust estimation of the false discovery rate. Bioinformatics, 22(16):1979-1987, 2006.

M.E. Ritchie, B. Phipson, D. Wu, Y. Hu, C.W. Law, W. Shi and G.K. Smyth. “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic Acids Research, 43(7), pp.e47. 2015.

J.D. Storey, J.E. Taylor, and D. Siegmund. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66(1):187-205, 2004.

J.D. Storey and R. Tibshirani. Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences, 100(16):9440-9445, 2003.

J.D. Storey. qvalue: Q-value estimation for false discovery rate control. R package version 2.0.0, http://qvalue.princeton.edu/, http://github.com/jdstorey/qvalue. 2015.

H.-Q. Wang, L.K. Tuominen, and C.-J. Tsai. SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures. Bioinformatics, 27(2):225-231, 2011.

`See Also`

calibration.plot, adjust.p

`Examples`

Run this code#get p-values
data(LFQRatio2)
p=LFQRatio2[,7]

#estimate the proportion of true null hypotheses with different methods
r=estim.pi0(p)
r$pi0

#estimate the proportion of true null hypotheses with the "abh" method
r=estim.pi0(p, pi0.method="abh")
r$pi0

#compare with one minus the proportion of human proteins 
prop_human=sum(LFQRatio2$Organism=="human")/length(LFQRatio2$Organism)
pi0_true=1-prop_human
pi0_true
Run the code above in your browser using DataLab