pagoda.gene.clusters: Determine de-novo gene clusters and associated overdispersion info

Description

Determine de-novo gene clusters, their weighted PCA lambda1 values, and random matrix expectation.

Usage

pagoda.gene.clusters(varinfo, trim = 3.1/ncol(varinfo$mat),
  n.clusters = 150, n.samples = 60, cor.method = "p",
  n.internal.shuffles = 0, n.starts = 10, n.cores = detectCores(),
  verbose = 0, plot = FALSE, show.random = FALSE, n.components = 1,
  method = "ward.D", secondary.correlation = FALSE,
  n.cells = ncol(varinfo$mat), old.results = NULL)

Arguments

varinfo

varinfo adjusted variance info from pagoda.varinfo() (or pagoda.subtract.aspect())

trim

additional Winsorization trim value to be used in determining clusters (to remove clusters that group outliers occurring in a given cell). Use higher values (5-15) if the resulting clusters group outlier patterns

n.clusters

number of clusters to be determined (recommended range is 100-200)

n.samples

number of randomly generated matrix samples to test the background distribution of lambda1 on

cor.method

correlation method ("pearson", "spearman") to be used as a distance measure for clustering

n.internal.shuffles

number of internal shuffles to perform (only if interested in set coherence, which is quite high for clusters by definition, disabled by default; set to 10-30 shuffles to estimate)

n.starts

number of wPCA EM algorithm starts at each iteration

n.cores

number of cores to use

verbose

verbosity level

plot

whether a plot showing distribution of random lambda1 values should be shown (along with the extreme value distribution fit)

show.random

whether the empirical random gene set values should be shown in addition to the Tracy-Widom analytical approximation

n.components

number of PC to calculate (can be increased if the number of clusters is small and some contain strong secondary patterns - rarely the case)

method

clustering method to be used in determining gene clusters

secondary.correlation

whether clustering should be performed on the correlation of the correlation matrix instead

n.cells

number of cells to use for the randomly generated cluster lambda1 model

old.results

optionally, pass old results just to plot the model without recalculating the stats

Value

a list containing the following fields:
- clusters
{a list of genes in each cluster values}
xf
{ extreme value distribution fit for the standardized lambda1 of a randomly generated pattern}
tci
{ index of a top cluster in each random iteration}
cl.goc
{weighted PCA info for each real gene cluster}
varm
{standardized lambda1 values for each randomly generated matrix cluster}
clvlm
{a linear model describing dependency of the cluster lambda1 on a Tracy-Widom lambda1 expectation}

Examples

Run this code

data(pollen)
cd <- clean.counts(pollen)
knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
clpca <- pagoda.gene.clusters(varinfo, trim=7.1/ncol(varinfo$mat), n.clusters=150, n.cores=10, plot=FALSE)

Run the code above in your browser using DataLab