PoisMixClus(y, gmin, gmax, lib.size = TRUE, lib.type = "TC",
conds, init.type = "small-em", alg.type = "EM",
cutoff = 1e-05, iter = 1000, mean.filter = FALSE, verbose = FALSE)gmax)gmax)FALSE, the library size parameter is not included in the model (i.e., the PMM-I model). If TRUE, the library size parameter is included in the Poisson mixture model (i.e., the PMM-II model)lib.size = TRUE, the estimator to be used for the library size parameter (TCQMedRatioysmall-emkmeansEMCEMmean.filter.TRUE, include verbose outputgmax-gmin+1) containing the estimates $\hat{\ensuremath\boldsymbol{\lambda}}$ for each of the models (g = gmin, ..., gmax). For a given model g, lambda[[g]] is a matrix of dimension (d x g), where d is the number of conditions (treatment groups) and g is the number of clustersgmax-gmin+1) containing the estimates $\hat{\ensuremath\boldsymbol{\pi}}$ for each of the models (g = gmin, ..., gmax). For a given model g, pi[[g]] is a vector of length g, where g is the number of clustersgmax-gmin+1)) containing the label assignments for each of the n observations in each of the models (g = gmin, ..., gmax)gmax-gmin+1) containing the conditional probabilities of belonging to each cluster for all observations for each of the models (g = gmin, ..., gmax). For a given model g, probaPost[[g]] is a (n x g) matrix containing the conditional probailities of belonging to each of the g clusters for all observationsgmin, ..., gmax)gmin, ..., gmax)alg.type above)gmin, ..., gmax)gmin, ..., gmax)g.BIC)g.ICL)g.BIC) matrix containing the estimate of $\hat{\ensuremath\boldsymbol{\lambda}}$ for the model selected via the BIC (with number of clusters g.BIC)g.BIC containing the estimate of $\hat{\ensuremath\boldsymbol{\pi}}$ for the model selected via the BIC (with number of clusters g.BIC)g.ICL) matrix containing the estimate of $\hat{\ensuremath\boldsymbol{\lambda}}$ for the model selected via the ICL (with number of clusters g.ICL)g.ICL containing the estimate of $\hat{\ensuremath\boldsymbol{\pi}}$ for the model selected via the ICL (with number of clusters g.ICL)g.BIC) matrix containing the conditional probabilities of belonging to each cluster for all observations for the model selected via the BIC (with number of clusters g.BIC)g.ICL) matrix containing the conditional probabilities of belonging to each cluster for all observations for the model selected via the ICL (with number of clusters g.ICL)alg.type above)lib.size = TRUE; matches the argument alg.type above)lib.size = TRUE)y above, unless observations were removed using the optional mean.filter)mean.filtermean.filter aboveHTSCluster.
In a Poisson mixture model, the data $\mathbf{y}$ are assumed to come from g distinct subpopulations (clusters), each of which is modeled separately; the overall population is thus a mixture of these subpopulations. In the case of a Poisson mixture model with g components, the model may be written as
$$f(\mathbf{y};g,\ensuremath\boldsymbol{\Psi}_g) = \prod_{i=1}^n \sum_{k=1}^g \pi_k \prod_{j=1}^{d}\prod_{l=1}^{r_j} P(y_{ijl} ; \ensuremath\boldsymbol{\theta}_k)$$
for $i = 1, \ldots, n$ observations in $l = 1, \ldots, r_j$ replicates of $j = 1, \ldots, d$ conditions (treatment groups), where $P(\cdot)$ is the standard Poisson density, $\ensuremath\boldsymbol{\Psi}_g = (\pi_1,\ldots,\pi_{g-1}, \ensuremath\boldsymbol{\theta}^\prime)$, $\ensuremath\boldsymbol{\theta}^\prime$ contains all of the parameters in $\ensuremath\boldsymbol{\theta}_1,\ldots,\ensuremath\boldsymbol{\theta}_g$ assumed to be distinct, and $\ensuremath\boldsymbol{\pi} = (\pi_1,\ldots,\pi_g)^\prime$ are the mixing proportions such that $\pi_k$ is in (0,1) for all k and $\sum_k \pi_k = 1$.
We consider two possible parameterizations for the mean $\ensuremath\boldsymbol{\theta}_k = (\mu_{ijlk})$. In the first, called the PMM-I, we consider
$$\mu_{ijlk} = w_i \lambda_{jk}$$
where $w_i$ corresponds to the expression level of observation i and $\ensuremath\boldsymbol{\lambda}_k = (\lambda_{1k},\ldots,\lambda_{dk})$ corresponds to the clustering parameters that define the profiles of the genes in cluster k across all variables. In the second parameterization, called the PMM-II, we consider
$$\mu_{ijlk} = w_i s_{jl} \lambda_{jk}$$
where $w_i$ and $\ensuremath\boldsymbol{\lambda}_k$ are as before and $s_{jl}$ is the normalized library size (a fixed constant) for replicate l of condition j. See Rau et al. (2011) for more details on the PMM-I and PMM-II.
There are two approaches to estimating the parameters of a finite mixture model and obtaining a clustering of the data: the estimation approach (via the EM algorithm) and the clustering approach (via the CEM algorithm). Parameter initialization is done using a Small-EM strategy as described in Rau et al. (2011) via the emInit function. Model selection may be performed using the BIC or ICL criteria.probaPost for the calculation of the conditional probability of belonging to a cluster;
PoisMixMean for the calculation of the per-cluster conditional mean of each observation;
logLikePoisMix for the calculation of the log likelihood of a Poisson mixture model;
emInit and kmeanInit for the Small-EM parameter initialization strategyset.seed(12345)
## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 200 observations
simulate <- PoisMixSim(n = 200, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions
## Run the PMM-II model for g = {3, 4, 5}
## "TC" library size estimate, EM algorithm
## Model selection via the ICL
run <- PoisMixClus(y, gmin = 3, gmax = 5, lib.size = TRUE, lib.type = "TC",
conds = conds, init.type = "small-em")
## Estimates of pi and lambda for the selected model
pi.est <- run$pi.ICL
lambda.est <- run$lambda.ICLRun the code above in your browser using DataLab