Learn R Programming

DESeq2 (version 1.6.3)

estimateSizeFactors: Estimate the size factors for a DESeqDataSet

Description

Estimate the size factors for a DESeqDataSet

Usage

"estimateSizeFactors"(object,locfunc=median,geoMeans,controlGenes,normMatrix)
"estimateSizeFactors"(object, locfunc = median, geoMeans, controlGenes, normMatrix)

Arguments

object
a DESeqDataSet
locfunc
a function to compute a location for a sample. By default, the median is used. However, especially for low counts, the shorth function from the genefilter package may give better results.
geoMeans
by default this is not provided and the geometric means of the counts are calculated within the function. A vector of geometric means from another count matrix can be provided for a "frozen" size factor calculation
controlGenes
optional, numeric or logical index vector specifying those genes to use for size factor estimation (e.g. housekeeping or spike-in genes)
normMatrix
optional, a matrix of normalization factors which do not control for library size (e.g. average transcript length of genes for each sample). Providing normMatrix will estimate size factors on the count matrix divided by normMatrix and store the product of the size factors and normMatrix as normalizationFactors.

Value

The DESeqDataSet passed as parameters, with the size factors filled in.

Details

This function estimates the size factors using the "median ratio method" described by Equation 5 in Ander and Huber (2010). The estimated size factors can be accessed using sizeFactors. Alternative library size estimators can also be supplied using sizeFactors.

Typically, the function is called with the idiom:

dds <- estimateSizeFactors(dds)

See DESeq for a description of the use of size factors in the GLM. One should call this function after DESeqDataSet unless size factors are manually specified with sizeFactors. Alternatively, gene-specific normalization factors for each sample can be provided using normalizationFactors which will always preempt sizeFactors in calculations.

Internally, the function calls estimateSizeFactorsForMatrix, which provides more details on the calculation.

References

Reference for the median ratio method:

Simon Anders, Wolfgang Huber: Differential expression analysis for sequence count data. Genome Biology 11 (2010) R106, http://dx.doi.org/10.1186/gb-2010-11-10-r106

See Also

estimateSizeFactorsForMatrix

Examples

Run this code
dds <- makeExampleDESeqDataSet(n=1000, m=12)
dds <- estimateSizeFactors(dds)
sizeFactors(dds)

dds <- estimateSizeFactors(dds, controlGenes=1:200)

m <- matrix(runif(1000 * 12, .5, 1.5), ncol=12)
dds <- estimateSizeFactors(dds, normMatrix=m)
normalizationFactors(dds)[1:3,1:3]

geoMeans <- exp(rowMeans(log(counts(dds))))
dds <- estimateSizeFactors(dds,geoMeans=geoMeans)
sizeFactors(dds)

Run the code above in your browser using DataLab