Learn R Programming

EBcoexpress (version 1.16.0)

initializeHP: Initializing Hyperparameters for EM

Description

A function for initializing the EM hyperparameters. While the user is free to do this in any manner s/he deems fit, we use the excellent Mclust approach of package R/mclust

Usage

initializeHP(D, conditions, seed = NULL, plottingOn = FALSE, colx = "red", applyTransform = TRUE, verbose = 0, subsize = NULL, ...)

Arguments

D
The correlation matrix output of makeMyD()
conditions
The conditions array
seed
A seed for making this procedure deterministic
plottingOn
Should the weighted vs. unweighted comparison plots be shown to the user? Default is FALSE (no plotting)
colx
The color of the fitted empirical density curve, if plottingOn is TRUE
applyTransform
Should Fisher's Z-transformation be applied to the correlations?
verbose
An option to control comments as initialization proceeds. Set to 1 to see comments, default is 0
subsize
A value less than the 1st dimension of D, if it is desired that only some of the pairs be used in the estimation process (for computational reasons)
...
Other options to be passed to plot()

Value

A list with four components, describing the hyperparameters:
G
The number of mixture components (1, 2 or 3)
MUS
Means across mixture components
TAUS
Standard deviations across mixture components
WEIGHTS
Weights across mixture components (these sum to 1)

Details

initializeHP() initializes the hyperparameters by asking Mclust to find the 1-, 2- or 3- component Normal mixture model that best fits the (transformed) correlations as a whole. Mclust directly returns estimates for G and the MUS; TAUS are estimated using Mclust's estimates and sample sizes, per our model. WEIGHTS are estimated using the mixture component classifications Mclust provides; however, it is unclear whether those classifications should we weighted by how confident Mclust is in their accuracy. We have tried both approaches and neither is superior to the other in all cases, so we compute both, compare the model fits and return the WEIGHTS that best empirically fit the data. This process is shown visually if plottingOn is TRUE and the comments (if enabled) will describe the comparison. In the event that the TAUS are estimated to be less than 0, half of the (TAUS+SS_VARIANCE) estimate provided by Mclust is used for TAUS; we especially do not recommend the use of ebCoexpressZeroStep when this occurs (as opposed to generally favoring the one-step version over the zero-step)

References

Dawson JA and Kendziorski C. An empirical Bayesian approach for identifying differential co-expression in high-throughput experiments. (2011) Biometrics. E-publication before print: http://onlinelibrary.wiley.com/doi/10.1111/j.1541-0420.2011.01688.x/abstract

Examples

Run this code
data(fiftyGenes)
tinyCond <- c(rep(1,100),rep(2,25))
tinyPat <- ebPatterns(c("1,1","1,2"))
D <- makeMyD(fiftyGenes, tinyCond, useBWMC=TRUE)
set.seed(3)
initHP <- initializeHP(D, tinyCond)

Run the code above in your browser using DataLab