miscellaneous: Various Functions for Retrieving Information from Clustering Results

Description

Various functions are available to retrieve the information criteria (criterion), the posterior probabilities of clustering memberships $z$ (posterior), the “weights” $u$ (importance), the uncertainty (uncertainty), and the estimates of the cluster proportions, means and variances (getEstimates) resulted from the clustering (filtering) operation.

Usage

criterion(object, ...)
criterion(object) <- value
posterior(object, assign=FALSE)
importance(object, assign=FALSE)
uncertainty(object)
getEstimates(object, data)

Arguments

object

Object returned from flowClust or filter. For the replacement method of criterion, the object must be of class flowClustList or tmixFilterResultList.

...

Further arguments. Currently this is type, a character string. May take "BIC", "ICL" or "logLike", to specify the criterion desired.

value

A character string stating the criterion used to choose the best model. May take either "BIC" or "ICL".

assign

A logical value. If TRUE, only the quantity (z for posterior or u for importance) associated with the cluster to which an observation is assigned will be returned. Default is FALSE, meaning that the quantities associated with all the clusters will be returned.

data

A numeric vector, matrix, data frame of observations, or object of class flowFrame; an optional argument. This is the object on which flowClust or filter was performed.

Value

Denote by $K$ the number of clusters, $N$ the number of observations, and $P$ the number of variables. For posterior and importance, a matrix of size $N x K$ is returned if assign=FALSE (default). Otherwise, a vector of size $N$ is outputted. uncertainty always outputs a vector of size $N$. getEstimates returns a list with named elements, proportions, locations and, if the data object is provided, dispersion. proportions is a vector of size $P$ and contains the estimates of the $K$ cluster proportions. locations is a matrix of size $K x P$ and contains the estimates of the $K$ mean vectors transformed back to the original scale (i.e., rbox(object@mu, object@lambda)). dispersion is an array of dimensions $K x P x P$, containing the approximate estimates of the $K$ covariance matrices on the original scale.

Details

These functions are written to retrieve various slots contained in the object returned from the clustering operation. criterion is to retrieve object@BIC, object@ICL or object@logLike. It replacement method modifies object@index and object@criterion to select the best model according to the desired criterion. posterior and importance provide a means to conveniently retrieve information stored in object@z and object@u respectively. uncertainty is to retrieve object@uncertainty. getEstimates is to retrieve information stored in object@mu (transformed back to the original scale) and object@w; when the data object is provided, an approximate variance estimate (on the original scale, obtained by performing one M-step of the EM algorithm without taking the Box-Cox transformation) will also be computed.

References

Lo, K., Brinkman, R. R. and Gottardo, R. (2008) Automated Gating of Flow Cytometry Data via Robust Model-based Clustering. Cytometry A 73, 321-332.