fitNN fits a normal-normal hierarchical model (wrapper to call emfit in package EBarrays with the LNNMV model). fitNNSingleHyp is the same as fitNN but only considers the pattern that all groups are different from each other. adjustfitNN corrects a small sample-size bias in the fitNN estimation procedure.
fitGG(x, groups, patterns, equalcv = TRUE, nclust = 1, method = "quickEM", B, priorpar, parini, trace = TRUE)
fitNN(x, groups, patterns, B=20, trace=TRUE)
fitNNSingleHyp(x, groups, B=10, trace=TRUE)
adjustfitNN(fit, pitrue, B=5, nsim=3, mc.cores=1)
ExpressionSet
, exprSet
, data frame or matrix
containing the gene expression measurements used to fit the
model. For fitNN
, x
should be in log-scale (in
contrast to the input to emfit
). fitGG
accepts raw and
log-scale (as long as all log-measurements remain positive). By
default we recommend using log-scale to reduce the influence of outliers. x
is of type ExpressionSet
or
exprSet
, groups
should be the name of the column
in pData(x)
with the groups that one wishes to compare. If
x
is a matrix or a data frame, groups
should be a
vector indicating to which group each column in x
corresponds to.colnames(patterns)
must match the group levels
specified in groups
.
Defaults to two hypotheses: null hypothesis of all groups
being equal and full alternative of all groups being different.
The function buildPatterns
can be used to construct a matrix
with all possible patterns.equalcv==TRUE
fits model assuming constant CV across groups. equalcv==FALSE
compares cv as well as mean expression levels between groupsnclust
corresponds to the GaGa model. method=='MH'
fits a fully Bayesian model via
Metropolis-Hastings posterior sampling. method=='Gibbs'
does
the same using Gibbs sampling. method=='SA'
uses Simulated
Annealing to find the posterior mode. method=='EM'
finds
maximum-likelihood estimates via the expectation-maximization
algorithm, but this is currently only implemented for
nclust>1
. method=='quickEM'
is a quicker
implementation that only performs 2 optimization steps (see details).method=='MH'
and method=='Gibbs'
, B
is the number of MCMC iterations (defaults to 1000). For
method=='SA'
, B
is the number of iterations in the
Simulated Annealing scheme (defaults to 200). For
method=='EM'
, B
is the maximum number of iterations
(defaults to 20). a.alpha0,b.alpha0,a.nu,b.nu,a.balpha,b.balpha,a.nualpha,b.nualpha,p.probclus
and p.probpat
. If missing they are set to non-informative
values that are usually reasonable for RMA and GCRMA normalized data.a0
, nu
,
balpha
, nualpha
, probclus
and probpat
indicating the starting values for the hyper-parameters. If not
specified, a method of moments estimate is used.trace==TRUE
the progress of the model fitting
routine is printed.nnfit
object, as returned by fitNN
pi
values for which to evaluate the
MoM estimation bias. See details.pitrue
valuemulticore
is available,
mc.cores
specifies the number of cores to be used for
parallel computing.fitGG
returns an object of class gagafit
, with components
method=='EBayes'
, for method=='Bayes'
one must call the function parest
after fitGG
mcmc
with posterior draws for hyper-parameters. Only returned if method=='Bayes'
.method=='Bayes'
it is the log-likelihood evaluated at each MCMC iteration. For method=='EBayes'
it is the log-likelihood evaluated at the maximum.gagahyp
.fitNN
returns an analogous object of class nnfit
. The
component nn.fit
is the object returned by emfit
.
rcgamma
and mcgamma
. The cooling scheme in method=='SA'
uses a temperature equal to
1/log(1+i)
, where i
is the iteration number.
The EM implementation in method=='quickEM'
is a quick EM
algorithm that usually delivers hyper-parameter estimates very similar
to those obtained via the slower method=='EM'
. Additionally,
the GaGa model inference has been seen to be robust to moderate
changes in the hyper-parameter estimates in most datasets.
fitNN
is a wrapper to emfit
in package EBarrays with the LNNMV model.
This procedure estimates hyper-parameters using the method of moments
(MoM), which typically results in over-estimating the proportion of
differentially expressed genes, which we denote by pi1.
adjustfitNN
corrects this bias by repeatedly simulating from
the prior predictive of a normal-normal model. Simulations are
performed for a grid of pi1 values, so that the expected bias can be
evaluated for each of them. The bias is then modeled as a smooth
function of pi1 using function gam
from package mgcv
.
Finally, the value of pi1 is bias-adjusted and the posterior
probabilities are recomputed using the updated pi1 value.
Yuan, M. and Kendziorski, C. (2006). A unified approach for simultaneous gene clustering and differential expression identification. Biometrics 62(4): 1089-1098.
parest
to estimate hyper-parameters and compute
posterior probabilities after a GaGa or MiGaGa
fit. findgenes
to find differentially expressed
genes. classpred
to predict the group that a new sample
belongs to. library(gaga)
set.seed(10)
n <- 100; m <- c(6,6)
a0 <- 25.5; nu <- 0.109
balpha <- 1.183; nualpha <- 1683
probpat <- c(.95,.05)
xsim <- simGG(n,m,p.de=probpat[2],a0,nu,balpha,nualpha,equalcv=TRUE)
x <- exprs(xsim)
#Frequentist fit: EM algorithm to obtain MLE
groups <- pData(xsim)$group[c(-6,-12)]
patterns <- matrix(c(0,0,0,1),2,2)
colnames(patterns) <- c('group 1','group 2')
gg1 <- fitGG(x[,c(-6,-12)],groups,patterns=patterns,method='EM',trace=FALSE)
gg1 <- parest(gg1,x=x[,c(-6,-12)],groups)
gg1
Run the code above in your browser using DataLab