Seurat (version 1.4.0)

MeanVarPlot: Identify variable genes

Description

Identifies genes that are outliers on a 'mean variability plot'. First, uses a function to calculate average expression (fxn.x) and dispersion (fxn.y) for each gene. Next, divides genes into num.bin (deafult 20) bins based on their average expression, and calculates z-scores for dispersion within each bin. The purpose of this is to identify variable genes while controlling for the strong relationship between variability and average expression.

Usage

MeanVarPlot(object, fxn.x = expMean, fxn.y = logVarDivMean,
  do.plot = TRUE, set.var.genes = TRUE, do.text = TRUE,
  x.low.cutoff = 0.1, x.high.cutoff = 8, y.cutoff = 2,
  y.high.cutoff = Inf, cex.use = 0.5, cex.text.use = 0.5,
  do.spike = FALSE, pch.use = 16, col.use = "black",
  spike.col.use = "red", plot.both = FALSE, do.contour = TRUE,
  contour.lwd = 3, contour.col = "white", contour.lty = 2, num.bin = 20,
  do.recalc = TRUE)

Arguments

object

Seurat object

fxn.x

Function to compute x-axis value (average expression). Default is to take the mean of the detected (i.e. non-zero) values

fxn.y

Function to compute y-axis value (dispersion). Default is to take the standard deviation of all values/

do.plot

Plot the average/dispersion relationship

set.var.genes

Set object@var.genes to the identified variable genes (default is TRUE)

do.text

Add text names of variable genes to plot (default is TRUE)

x.low.cutoff

Bottom cutoff on x-axis for identifying variable genes

x.high.cutoff

Top cutoff on x-axis for identifying variable genes

y.cutoff

Bottom cutoff on y-axis for identifying variable genes

y.high.cutoff

Top cutoff on y-axis for identifying variable genes

cex.use

Point size

cex.text.use

Text size

do.spike

FALSE by default. If TRUE, color all genes starting with ^ERCC a different color

pch.use

Pch value for points

col.use

Color to use

spike.col.use

if do.spike, color for spike-in genes

plot.both

Plot both the scaled and non-scaled graphs.

do.contour

Draw contour lines calculated based on all genes

contour.lwd

Contour line width

contour.col

Contour line color

contour.lty

Contour line type

num.bin

Total number of bins to use in the scaled analysis (default is 20)

do.recalc

TRUE by default. If FALSE, plots and selects variable genes without recalculating statistics for each gene.

Value

Returns a Seurat object, placing variable genes in object@var.genes. The result of all analysis is stored in object@mean.var

Details

Exact parameter settings may vary empirically from dataset to dataset, and based on visual inspection of the plot. Setting the y.cutoff parameter to 2 identifies genes that are more than two standard deviations away from the average dispersion within a bin. The default X-axis function is the mean expression level, and for Y-axis it is the log(Variance/mean). All mean/variance calculations are not performed in log-space, but the results are reported in log-space - see relevant functions for exact details.