initializeHP: Initializing Hyperparameters for EM

Description

A function for initializing the EM hyperparameters. While the user is free to do this in any manner s/he deems fit, we use the excellent Mclust approach of package R/mclust

Usage

initializeHP(D, conditions, seed = NULL, plottingOn = FALSE, colx = "red",
applyTransform = TRUE, verbose = 0, subsize = NULL, ...)

Arguments

The correlation matrix output of makeMyD()

conditions

The conditions array

seed

A seed for making this procedure deterministic

plottingOn

Should the weighted vs. unweighted comparison plots be shown to the user? Default is FALSE (no plotting)

colx

The color of the fitted empirical density curve, if plottingOn is TRUE

applyTransform

Should Fisher's Z-transformation be applied to the correlations?

verbose

An option to control comments as initialization proceeds. Set to 1 to see comments, default is 0

subsize

A value less than the 1st dimension of D, if it is desired that only some of the pairs be used in the estimation process (for computational reasons)

...

Other options to be passed to plot()

Value

G: The number of mixture components (1, 2 or 3)
MUS: Means across mixture components
TAUS: Standard deviations across mixture components
WEIGHTS: Weights across mixture components (these sum to 1)

Details

initializeHP() initializes the hyperparameters by asking Mclust to find the 1-, 2- or 3- component Normal mixture model that best fits the (transformed) correlations as a whole. Mclust directly returns estimates for G and the MUS; TAUS are estimated using Mclust's estimates and sample sizes, per our model. WEIGHTS are estimated using the mixture component classifications Mclust provides; however, it is unclear whether those classifications should we weighted by how confident Mclust is in their accuracy. We have tried both approaches and neither is superior to the other in all cases, so we compute both, compare the model fits and return the WEIGHTS that best empirically fit the data. This process is shown visually if plottingOn is TRUE and the comments (if enabled) will describe the comparison. In the event that the TAUS are estimated to be less than 0, half of the (TAUS+SS_VARIANCE) estimate provided by Mclust is used for TAUS; we especially do not recommend the use of ebCoexpressZeroStep when this occurs (as opposed to generally favoring the one-step version over the zero-step)

References

Dawson JA and Kendziorski C. An empirical Bayesian approach for identifying differential co-expression in high-throughput experiments. (2011) Biometrics. E-publication before print: http://onlinelibrary.wiley.com/doi/10.1111/j.1541-0420.2011.01688.x/abstract

Examples

Run this code

data(fiftyGenes)
tinyCond <- c(rep(1,100),rep(2,25))
tinyPat <- ebPatterns(c("1,1","1,2"))
D <- makeMyD(fiftyGenes, tinyCond, useBWMC=TRUE)
set.seed(3)
initHP <- initializeHP(D, tinyCond)

Run the code above in your browser using DataLab