Learn R Programming

Seurat (version 2.3.0)

FindVariableGenes: Identify variable genes

Description

Identifies genes that are outliers on a 'mean variability plot'. First, uses a function to calculate average expression (mean.function) and dispersion (dispersion.function) for each gene. Next, divides genes into num.bin (deafult 20) bins based on their average expression, and calculates z-scores for dispersion within each bin. The purpose of this is to identify variable genes while controlling for the strong relationship between variability and average expression.

Usage

FindVariableGenes(object, mean.function = ExpMean,
  dispersion.function = LogVMR, do.plot = TRUE, set.var.genes = TRUE,
  x.low.cutoff = 0.1, x.high.cutoff = 8, y.cutoff = 1,
  y.high.cutoff = Inf, num.bin = 20, binning.method = "equal_width",
  do.recalc = TRUE, sort.results = TRUE, do.cpp = TRUE,
  display.progress = TRUE, ...)

Arguments

object

Seurat object

mean.function

Function to compute x-axis value (average expression). Default is to take the mean of the detected (i.e. non-zero) values

dispersion.function

Function to compute y-axis value (dispersion). Default is to take the standard deviation of all values/

do.plot

Plot the average/dispersion relationship

set.var.genes

Set object@var.genes to the identified variable genes (default is TRUE)

x.low.cutoff

Bottom cutoff on x-axis for identifying variable genes

x.high.cutoff

Top cutoff on x-axis for identifying variable genes

y.cutoff

Bottom cutoff on y-axis for identifying variable genes

y.high.cutoff

Top cutoff on y-axis for identifying variable genes

num.bin

Total number of bins to use in the scaled analysis (default is 20)

binning.method

Specifies how the bins should be computed. Available methods are:

  • equal_width: each bin is of equal width along the x-axis [default]

  • equal_frequency: each bin contains an equal number of genes (can increase statistical power to detect overdispersed genes at high expression values, at the cost of reduced resolution along the x-axis)

do.recalc

TRUE by default. If FALSE, plots and selects variable genes without recalculating statistics for each gene.

sort.results

If TRUE (by default), sort results in object@hvg.info in decreasing order of dispersion

do.cpp

Run c++ version of mean.function and dispersion.function if they exist.

display.progress

show progress bar for calculations

...

Extra parameters to VariableGenePlot

Value

Returns a Seurat object, placing variable genes in object@var.genes. The result of all analysis is stored in object@hvg.info

Details

Exact parameter settings may vary empirically from dataset to dataset, and based on visual inspection of the plot. Setting the y.cutoff parameter to 2 identifies genes that are more than two standard deviations away from the average dispersion within a bin. The default X-axis function is the mean expression level, and for Y-axis it is the log(Variance/mean). All mean/variance calculations are not performed in log-space, but the results are reported in log-space - see relevant functions for exact details.

See Also

VariableGenePlot

Examples

Run this code
# NOT RUN {
pbmc_small <- FindVariableGenes(object = pbmc_small, do.plot = FALSE)
pbmc_small@var.genes

# }

Run the code above in your browser using DataLab