emInit) described in Rau et al. (2011), a K-means initialization strategy (kmeanInit) that is itself used to initialize the small EM strategy, the splitting small-EM initialization strategy (splitEMInit) described in Papastamoulis et al. (2012), and a function to initialize a small-EM strategy using the posterior probabilities (probaPostInit) obtained from a previous run with one fewer cluster following the splitting strategy.emInit(y, g, conds, lib.size, lib.type, alg.type = "EM",
init.runs, init.iter, fixed.lambda, equal.proportions, verbose,
s=NA)
kmeanInit(y, g, conds, lib.size, lib.type, fixed.lambda,
equal.proportions, s=NA)
splitEMInit(y, g, conds, lib.size, lib.type, alg.type, fixed.lambda,
equal.proportions, prev.labels, prev.probaPost, init.runs,
init.iter, verbose, s=NA)
probaPostInit(y, g, conds, lib.size, lib.type, alg.type = "EM",
fixed.lambda, equal.proportions, probaPost.init, init.iter,
verbose, s=NA)fixed.lambda contains a list of lambda values to be fixed, g corresponds to the number of clusters in addition to those fixed.yFALSE, the library size parameter is not included in the model (i.e., the PMM-I model). If TRUE, the library size parameter is included in the Poisson mixture model (i.e., the PMM-II model)lib.size = TRUE, the estimator to be used for the library size parameter: (TCUQMedEMCEMTRUE, the cluster proportions are set to be equal for all clusters. Default is FALSE (unequal cluster proportions)splitEMInit functionTRUE, include verbose outputg containing the estimate for $\hat{\ensuremath\boldsymbol{\pi}}$ corresponding to the highest log likelihood (or completed log likelihood) from the chosen inialization strategy.g) matrix containing the estimate of $\hat{\ensuremath\boldsymbol{\lambda}}$ corresponding to the highest log likelihood (or completed log likelihood) from the chosen initialization strategy, where d is the number of conditions and g is the number of clusters.g) matrix containing the estimate of $\hat{\ensuremath\boldsymbol{\lambda}}$ arising from the splitting initialization and small EM run for a single split,
where d is the number of conditions and g is the number of clusters.g containing the estimate for $\hat{\ensuremath\boldsymbol{\pi}}$ arising from the splitting initialization and small EM run for a single split, where g is the
number of clusters.PoisMixClus function (via init.type) or via the
PoisMixClusWrapper function for a sequence of cluster numbers (via gmin.init.type and split.init).
To initialize parameter values for the EM and CEM algorithms, for the Small-EM strategy (Biernacki et al., 2003) we use the emInit function as follows. For a given number of independent runs (given by init.runs), the following procedure is used to obtain parameter values: first, a K-means algorithm (MacQueen, 1967) is run to partition the data into g clusters ($\hat{\ensuremath\boldsymbol{z}}^{(0)}$). Second, initial parameter values $\ensuremath\boldsymbol{\pi}^{(0)}$ and $\ensuremath\boldsymbol{\lambda}^{(0)}$ are calculated (see Rau et al. (2011) for details). Third, a given number of iterations of an EM algorithm are run (defined by init.iter), using $\ensuremath\boldsymbol{\pi}^{(0)}$ and $\ensuremath\boldsymbol{\lambda}^{(0)}$ as initial values. Finally, among the init.runs sets of parameter values, we use $\hat{\ensuremath\boldsymbol{\lambda}}$ and $\hat{\ensuremath\boldsymbol{\pi}}$ corresponding to the highest log likelihood or completed log likelihood to initialize the subsequent full EM or CEM algorithms, respectively.
For the splitting small EM initialization strategy, we implement an approach similar to that described in Papastamoulis et al. (2012),
where the cluster from the previous run (with g-1 clusters) with the largest entropy is chosen to be split into two new clusters,
followed by a small EM run as described above.PoisMixClus for Poisson mixture model estimation for a given number of clusters,
PoisMixClusWrapper for Poisson mixture model estimation and model selection for a sequence of cluster numbers.set.seed(12345)
## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 500 observations
simulate <- PoisMixSim(n = 500, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions
## Calculate initial values for lambda and pi using the Small-EM
## initialization (4 classes, PMM-II model with "TC" library size)
##
## init.values <- emInit(y, g = 4, conds, lib.size = TRUE,
## lib.type = "TC", alg.type = "EM",
## init.runs = 50, init.iter = 10, fixed.lambda = NA,
## equal.proportions = FALSE, verbose = FALSE)
## pi.init <- init.values$pi.init
## lambda.init <- init.values$lambda.initRun the code above in your browser using DataLab