squeezeVar: Squeeze Sample Variances

Description

Squeeze a set of sample variances together by computing empirical Bayes posterior means.

Usage

squeezeVar(var, df, covariate=NULL, robust=FALSE, winsor.tail.p=c(0.05,0.1))

Arguments

var

numeric vector of independent sample variances.

numeric vector of degrees of freedom for the sample variances.

covariate

if non-NULL, var.prior will depend on this numeric covariate. Otherwise, var.prior is constant.

robust

logical, should the estimation of df.prior and var.prior be robustified against outlier sample variances?

winsor.tail.p

numeric vector of length 1 or 2, giving left and right tail proportions of x to Winsorize. Used only when robust=TRUE.

Value

var.post: numeric vector of posterior variances.
var.prior: location of prior distribution. A vector if covariate is non-NULL, otherwise a scalar.
df.prior: degrees of freedom of prior distribution. A vector if robust=TRUE, otherwise a scalar.

Details

This function implements an empirical Bayes algorithm proposed by Smyth (2004).

A conjugate Bayesian hierarchical model is assumed for a set of sample variances. The hyperparameters are estimated by fitting a scaled F-distribution to the sample variances. The function returns the posterior variances and the estimated hyperparameters.

Specifically, the sample variances var are assumed to follow scaled chi-squared distributions, conditional on the true variances, and an scaled inverse chi-squared prior is assumed for the true variances. The scale and degrees of freedom of this prior distribution are estimated from the values of var.

The effect of this function is to squeeze the variances towards a common value, or to a global trend if a covariate is provided. The squeezed variances have a smaller expected mean square error to the true variances than do the sample variances themselves.

If covariate is non-null, then the scale parameter of the prior distribution is assumed to depend on the covariate. If the covariate is average log-expression, then the effect is an intensity-dependent trend similar to that in Sartor et al (2006).

robust=TRUE implements the robust empirical Bayes procedure of Phipson et al (2013) which allows some of the var values to be outliers.

References

Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2013). Empirical Bayes in the presence of exceptional cases, with application to microarray data. Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia. http://www.statsci.org/smyth/pubs/RobustEBayesPreprint.pdf

Sartor MA, Tomlinson CR, Wesselkamper SC, Sivaganesan S, Leikauf GD, Medvedovic M (2006). Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments. BMC bioinformatics 7, 538.

Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3, No. 1, Article 3. http://www.statsci.org/smyth/pubs/ebayes.pdf

Examples

Run this code

s2 <- rchisq(20,df=5)/5
squeezeVar(s2, df=5)

Run the code above in your browser using DataLab