Learn R Programming

scde (version 2.0.1)

pagoda.varnorm: Normalize gene expression variance relative to transcriptome-wide expectations

Description

Normalizes gene expression magnitudes to ensure that the variance follows chi-squared statistics with respect to its ratio to the transcriptome-wide expectation as determined by local regression on expression magnitude (and optionally gene length). Corrects for batch effects.

Usage

pagoda.varnorm(models, counts, batch = NULL, trim = 0, prior = NULL,
  fit.genes = NULL, plot = TRUE, minimize.underdispersion = FALSE,
  n.cores = detectCores(), n.randomizations = 100, weight.k = 0.9,
  verbose = 0, weight.df.power = 1, smooth.df = -1, max.adj.var = 10,
  theta.range = c(0.01, 100), gene.length = NULL)

Arguments

models
model matrix (select a subset of rows to normalize variance within a subset of cells)
counts
read count matrix
batch
measurement batch (optional)
trim
trim value for Winsorization (optional, can be set to 1-3 to reduce the impact of outliers, can be as large as 5 or 10 for datasets with several thousand cells)
prior
expression magnitude prior
fit.genes
a vector of gene names which should be used to establish the variance fit (default is NULL: use all genes). This can be used to specify, for instance, a set spike-in control transcripts such as ERCC.
plot
whether to plot the results
minimize.underdispersion
whether underdispersion should be minimized (can increase sensitivity in datasets with high complexity of population, however cannot be effectively used in datasets where multiple batches are present)
n.cores
number of cores to use
n.randomizations
number of bootstrap sampling rounds to use in estimating average expression magnitude for each gene within the given set of cells
weight.k
k value to use in the final weight matrix
verbose
verbosity level
weight.df.power
power factor to use in determining effective number of degrees of freedom (can be increased for datasets exhibiting particularly high levels of noise at low expression magnitudes)
smooth.df
degrees of freedom to be used in calculating smoothed local regression between coefficient of variation and expression magnitude (and gene length, if provided). Leave at -1 for automated guess.
max.adj.var
maximum value allowed for the estimated adjusted variance (capping of adjusted variance is recommended when scoring pathway overdispersion relative to randomly sampled gene sets)
theta.range
valid theta range (should be the same as was set in knn.error.models() call
gene.length
optional vector of gene lengths (corresponding to the rows of counts matrix)

Value

  • a list containing the following fields:
    • mat
    {adjusted expression magnitude values}
  • matw
  • { weight matrix corresponding to the expression matrix}
  • arv
  • { a vector giving adjusted variance values for each gene}
  • avmodes
  • {a vector estimated average expression magnitudes for each gene}
  • modes
  • {a list of batch-specific average expression magnitudes for each gene}
  • prior
  • {estimated (or supplied) expression magnitude prior}
  • edf
  • { estimated effective degrees of freedom}
  • fit.genes
  • { fit.genes parameter }

Examples

Run this code
data(pollen)
cd <- clean.counts(pollen)
knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)

Run the code above in your browser using DataLab