ExtractTopFeatures: Extracting top driving genes driving GoM clusters

Description

This function uses relative gene expression profile of the GoM clusters and applies a KL-divergence based method to obtain a list of top features that drive each of the clusters.

Usage

ExtractTopFeatures(theta, top_features = 10, method = c("poisson", "bernoulli"), options = c("min", "max"))

Arguments

theta

The cluster probability distribution/theta matrix obtained from the GoM model fitting (it is a G x K matrix where G is number of features, K number of topics).

top_features

The number of top features per cluster that drives away that cluster from others. Default value is 10.

method

The underlying model assumed for KL divergence measurement. Two choices considered are "bernoulli" and "poisson".

options

if "min", for each cluster k, we select features that maximize the minimum KL divergence of cluster k against all other clusters for each feature. If "max", we select features that maximize the maximum KL divergence of cluster k against all other clusters for each feature.

Value

A matrix (K x top_features) which tabulates in k-th row the top feature indices driving the cluster k.

Examples

Run this code

data("MouseDeng2014.FitGoM")
theta_mat <- MouseDeng2014.FitGoM$clust_6$theta;
top_features <- ExtractTopFeatures(theta_mat, top_features=100,
                                  method="poisson", options="min");

Run the code above in your browser using DataLab