Learn R Programming

scde (version 2.0.1)

pagoda.gene.clusters: Determine de-novo gene clusters and associated overdispersion info

Description

Determine de-novo gene clusters, their weighted PCA lambda1 values, and random matrix expectation.

Usage

pagoda.gene.clusters(varinfo, trim = 3.1/ncol(varinfo$mat),
  n.clusters = 150, n.samples = 60, cor.method = "p",
  n.internal.shuffles = 0, n.starts = 10, n.cores = detectCores(),
  verbose = 0, plot = FALSE, show.random = FALSE, n.components = 1,
  method = "ward.D", secondary.correlation = FALSE,
  n.cells = ncol(varinfo$mat), old.results = NULL)

Arguments

varinfo
varinfo adjusted variance info from pagoda.varinfo() (or pagoda.subtract.aspect())
trim
additional Winsorization trim value to be used in determining clusters (to remove clusters that group outliers occurring in a given cell). Use higher values (5-15) if the resulting clusters group outlier patterns
n.clusters
number of clusters to be determined (recommended range is 100-200)
n.samples
number of randomly generated matrix samples to test the background distribution of lambda1 on
cor.method
correlation method ("pearson", "spearman") to be used as a distance measure for clustering
n.internal.shuffles
number of internal shuffles to perform (only if interested in set coherence, which is quite high for clusters by definition, disabled by default; set to 10-30 shuffles to estimate)
n.starts
number of wPCA EM algorithm starts at each iteration
n.cores
number of cores to use
verbose
verbosity level
plot
whether a plot showing distribution of random lambda1 values should be shown (along with the extreme value distribution fit)
show.random
whether the empirical random gene set values should be shown in addition to the Tracy-Widom analytical approximation
n.components
number of PC to calculate (can be increased if the number of clusters is small and some contain strong secondary patterns - rarely the case)
method
clustering method to be used in determining gene clusters
secondary.correlation
whether clustering should be performed on the correlation of the correlation matrix instead
n.cells
number of cells to use for the randomly generated cluster lambda1 model
old.results
optionally, pass old results just to plot the model without recalculating the stats

Value

  • a list containing the following fields:
    • clusters
    {a list of genes in each cluster values}
  • xf
  • { extreme value distribution fit for the standardized lambda1 of a randomly generated pattern}
  • tci
  • { index of a top cluster in each random iteration}
  • cl.goc
  • {weighted PCA info for each real gene cluster}
  • varm
  • {standardized lambda1 values for each randomly generated matrix cluster}
  • clvlm
  • {a linear model describing dependency of the cluster lambda1 on a Tracy-Widom lambda1 expectation}

Examples

Run this code
data(pollen)
cd <- clean.counts(pollen)
knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
clpca <- pagoda.gene.clusters(varinfo, trim=7.1/ncol(varinfo$mat), n.clusters=150, n.cores=10, plot=FALSE)

Run the code above in your browser using DataLab