PoisMixClus) or a sequence of cluster numbers (PoisMixClusWrapper). Parameters are initialized using a Small-EM strategy as described in Rau et al. (2011) or the splitting small-EM strategy described in Papastamoulis et al. (2012), and model selection is performed using the ICL criteria. Note that these functions implement the PMM-I and PMM-II models described in Rau et al. (2011).PoisMixClus(y, g, conds, lib.size = TRUE, lib.type = "TMM",
init.type = "small-em", init.runs = 1, init.iter = 10,
alg.type = "EM", cutoff = 10e-6, iter = 1000, fixed.lambda = NA,
equal.proportions = FALSE, prev.labels = NA,
prev.probaPost = NA, verbose = FALSE, interpretation = "sum",
EM.verbose = FALSE, wrapper = FALSE, s = NA)
PoisMixClusWrapper(y, gmin = 1, gmax, conds, lib.size = TRUE,
lib.type = "TMM", gmin.init.type = "small-em",
init.runs = 1, init.iter = 10, split.init = TRUE, alg.type = "EM",
cutoff = 10e-6, iter = 1000, fixed.lambda = NA,
equal.proportions = FALSE, verbose = FALSE, interpretation = "sum",
EM.verbose = FALSE)fixed.lambda contains a list of lambda values to be fixed,
g corresponds to the number of clusters in addition to those fixed.gmin corresponds to the
minimum number of clusters in addition to those fixed.gmax corresponds to the
maximum number of clusters in addition to those fixed.yFALSE, the library size parameter is not included in the model (i.e., the PMM-I model). If TRUE, the library size parameter is included in the Poisson mixture model (i.e., the PMM-II model)lib.size = TRUE, the estimator to be used for the library size parameter: (TCUQMedsmall-emkmeansgmin):
(small-emkmeansTRUE, the splitting initialization strategy of Papastamoulis et al. (2012) will be used for cluster sizes
(gmin+1, ..., gmax). If FALSE, the initialization strategy specified in gmin.init.type<EMCEMTRUE, the cluster proportions are set to be equal for all clusters. Default is FALSE (unequal cluster proportions).TRUE, include verbose output"sum", cluster behavior is interpreted with respect to overall gene expression level (sums per gene),
otherwise for "mean", cluster behavior is interpreted with respect to mean gene expression (means per gene).TRUE, more informative output is printed about the EM algorithm, including the number of iterations run and
the difference between log-likelihoods at the last and penultimate iterations.TRUE if the PoisMixClus function is run from within the PoisMixClusWrapper main function, and FALSE
otherwise. This mainly helps to avoid recalculating parameters several times that are used throughout PoisMixClus function. When called
as part of the PoisMixClusWrapper main function, these are estimated a single time and passed to the alg.type above)alg.type above)lib.size = TRUE; matches the argument lib.type above)lib.size = TRUE)gmin, ..., gmax"Capushe"gmin, ..., gmaxHTSCluster giving the results from the model chosen via the ICL criterionHTSCluster giving the results from the model chosen via the BICHTSCluster giving the results from the model chosen via the DDSE slope heuristics criterionHTSCluster giving the results from the model chosen via the Djump slope heuristics criterionHTSCluster giving the results for all models for cluster sizes gmin, ..., gmaxNA for direct calls to PoisMixClus or
"DDSE", "Djump", "BIC", or "ICL" for the respective selected models for calls to PoisMixClusWrapperPoisMixClus is an S3 object of class HTSCluster, and output of PoisMixClusWrapper is an S3 object
of class HTSClusterWrapper.
In a Poisson mixture model, the data $\mathbf{y}$ are assumed to come from g distinct subpopulations (clusters), each of which is modeled separately; the overall population is thus a mixture of these subpopulations. In the case of a Poisson mixture model with g components, the model may be written as
$$f(\mathbf{y};g,\ensuremath\boldsymbol{\Psi}_g) = \prod_{i=1}^n \sum_{k=1}^g \pi_k \prod_{j=1}^{d}\prod_{l=1}^{r_j} P(y_{ijl} ; \ensuremath\boldsymbol{\theta}_k)$$
for $i = 1, \ldots, n$ observations in $l = 1, \ldots, r_j$ replicates of $j = 1, \ldots, d$ conditions (treatment groups), where $P(\cdot)$ is the standard Poisson density, $\ensuremath\boldsymbol{\Psi}_g = (\pi_1,\ldots,\pi_{g-1}, \ensuremath\boldsymbol{\theta}^\prime)$, $\ensuremath\boldsymbol{\theta}^\prime$ contains all of the parameters in $\ensuremath\boldsymbol{\theta}_1,\ldots,\ensuremath\boldsymbol{\theta}_g$ assumed to be distinct, and $\ensuremath\boldsymbol{\pi} = (\pi_1,\ldots,\pi_g)^\prime$ are the mixing proportions such that $\pi_k$ is in (0,1) for all k and $\sum_k \pi_k = 1$.
We consider two possible parameterizations for the mean $\ensuremath\boldsymbol{\theta}_k = (\mu_{ijlk})$. In the first, called the PMM-I, we consider
$$\mu_{ijlk} = w_i \lambda_{jk}$$
where $w_i$ corresponds to the expression level of observation i and $\ensuremath\boldsymbol{\lambda}_k = (\lambda_{1k},\ldots,\lambda_{dk})$ corresponds to the clustering parameters that define the profiles of the genes in cluster k across all variables. In the second parameterization, called the PMM-II, we consider
$$\mu_{ijlk} = w_i s_{jl} \lambda_{jk}$$
where $w_i$ and $\ensuremath\boldsymbol{\lambda}_k$ are as before and $s_{jl}$ is the normalized library size (a fixed constant) for replicate l of condition j. See Rau et al. (2011) for more details on the PMM-I and PMM-II.
There are two approaches to estimating the parameters of a finite mixture model and obtaining a clustering of the data: the estimation approach (via the EM algorithm) and the clustering approach (via the CEM algorithm). Parameter initialization is done using a Small-EM strategy as described in Rau et al. (2011) via the emInit function. Model selection may be performed using the BIC or ICL criteria.probaPost for the calculation of the conditional probability of belonging to a cluster;
PoisMixMean for the calculation of the per-cluster conditional mean of each observation;
logLikePoisMixDiff for the calculation of the log likelihood of a Poisson mixture model;
emInit and kmeanInit for the Small-EM parameter initialization strategyset.seed(12345)
## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 200 observations
simulate <- PoisMixSim(n = 200, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions
## Run the PMM model for g = 3
## "TC" library size estimate, EM algorithm
run <- PoisMixClus(y, g = 3, conds = conds, lib.type = "TC")
## Estimates of pi and lambda for the selected model
pi.est <- run$pi
lambda.est <- run$lambda
## Not run: PMM for 4 total clusters, with one fixed class
## "TC" library size estimate, EM algorithm
##
## run <- PoisMixClus(y, g = 3, lib.type = "TC", conds = conds,
## fixed.lambda = list(c(1,1,1)))
##
##
## Not run: PMM model for 4 clusters, with equal proportions
## "TC" library size estimate, EM algorithm
##
## run <- PoisMixClus(y, g = 4, lib.type = "TC", conds = conds,
## equal.proportions = TRUE)
##
##
## Not run: PMM model for g = 1, ..., 10 clusters, Split Small-EM init
##
## run1.10 <- PoisMixClusWrapper(y, gmin = 1, gmax = 10, conds = conds,
## lib.type = "TC")
##
##
## Not run: PMM model for g = 1, ..., 10 clusters, Small-EM init
##
## run1.10bis <- <- PoisMixClusWrapper(y, gmin = 1, gmax = 10, conds = conds,
## lib.type = "TC", split.init = FALSE)
##
##
## Not run: previous model equivalent to the following
##
## for(K in 1:10) {
## run <- PoisMixClus(y, g = K, conds = conds, lib.type = "TC")
## }Run the code above in your browser using DataLab