Estimate the size factors for a
This function estimates the size factors using the
"median ratio method" described by Equation 5 in Anders and Huber (2010).
The estimated size factors can be accessed using the accessor function
Alternative library size estimators can also be supplied
using the assignment function
"estimateSizeFactors"(object, type = c("ratio", "iterate"), locfunc = stats::median, geoMeans, controlGenes, normMatrix)
- a DESeqDataSet
- either "ratio" or "iterate". "ratio" uses the standard median ratio method introduced in DESeq. The size factor is the median ratio of the sample over a pseudosample: for each gene, the geometric mean of all samples. "iterate" offers an alternative estimator, which can be used even when all genes contain a sample with a zero. This estimator iterates between estimating the dispersion with a design of ~1, and finding a size factor vector by numerically optimizing the likelihood of the ~1 model.
- a function to compute a location for a sample. By default, the
median is used. However, especially for low counts, the
shorthfunction from the genefilter package may give better results.
- by default this is not provided and the geometric means of the counts are calculated within the function. A vector of geometric means from another count matrix can be provided for a "frozen" size factor calculation
- optional, numeric or logical index vector specifying those genes to use for size factor estimation (e.g. housekeeping or spike-in genes)
- optional, a matrix of normalization factors which do not yet
control for library size. Note that this argument should not be used (and
will be ignored) if the
ddsobject was created using
tximport. In this case, the information in
assays(dds)[["avgTxLength"]]is automatically used to create appropriate normalization factors. Providing
normMatrixwill estimate size factors on the count matrix divided by
normMatrixand store the product of the size factors and
normalizationFactors. It is recommended to divide out the row-wise geometric mean of
normMatrixso the rows roughly are centered on 1.
Typically, the function is called with the idiom:
dds <- estimateSizeFactors(dds)
DESeq for a description of the use of size factors in the GLM.
One should call this function after
unless size factors are manually specified with
Alternatively, gene-specific normalization factors for each sample can be provided using
normalizationFactors which will always preempt
Internally, the function calls
which provides more details on the calculation.
The DESeqDataSet passed as parameters, with the size factors filled
Reference for the median ratio method:
Simon Anders, Wolfgang Huber: Differential expression analysis for sequence count data. Genome Biology 2010, 11:106. http://dx.doi.org/10.1186/gb-2010-11-10-r106
dds <- makeExampleDESeqDataSet(n=1000, m=4) dds <- estimateSizeFactors(dds) sizeFactors(dds) dds <- estimateSizeFactors(dds, controlGenes=1:200) m <- matrix(runif(1000 * 4, .5, 1.5), ncol=4) dds <- estimateSizeFactors(dds, normMatrix=m) normalizationFactors(dds)[1:3,] geoMeans <- exp(rowMeans(log(counts(dds)))) dds <- estimateSizeFactors(dds,geoMeans=geoMeans) sizeFactors(dds)