`"estimateSizeFactors"(object, type = c("ratio", "iterate"), locfunc = stats::median, geoMeans, controlGenes, normMatrix)`

object

a DESeqDataSet

type

either "ratio" or "iterate". "ratio" uses the standard
median ratio method introduced in DESeq. The size factor is the
median ratio of the sample over a pseudosample: for each gene, the geometric mean
of all samples. "iterate" offers an alternative estimator, which can be
used even when all genes contain a sample with a zero. This estimator
iterates between estimating the dispersion with a design of ~1, and
finding a size factor vector by numerically optimizing the likelihood
of the ~1 model.

locfunc

a function to compute a location for a sample. By default, the
median is used. However, especially for low counts, the

`shorth`

function from the genefilter package may give better results.geoMeans

by default this is not provided and the
geometric means of the counts are calculated within the function.
A vector of geometric means from another count matrix can be provided
for a "frozen" size factor calculation

controlGenes

optional, numeric or logical index vector specifying those genes to
use for size factor estimation (e.g. housekeeping or spike-in genes)

normMatrix

optional, a matrix of normalization factors which do not yet
control for library size. Note that this argument should not be used (and
will be ignored) if the

`dds`

object was created using `tximport`

.
In this case, the information in `assays(dds)[["avgTxLength"]]`

is automatically used to create appropriate normalization factors.
Providing `normMatrix`

will estimate size factors on the
count matrix divided by `normMatrix`

and store the product of the
size factors and `normMatrix`

as `normalizationFactors`

.
It is recommended to divide out the row-wise geometric mean of
`normMatrix`

so the rows roughly are centered on 1.-
The DESeqDataSet passed as parameters, with the size factors filled
in.

`dds <- estimateSizeFactors(dds)`

See `DESeq`

for a description of the use of size factors in the GLM.
One should call this function after `DESeqDataSet`

unless size factors are manually specified with `sizeFactors`

.
Alternatively, gene-specific normalization factors for each sample can be provided using
`normalizationFactors`

which will always preempt `sizeFactors`

in calculations.

Internally, the function calls `estimateSizeFactorsForMatrix`

,
which provides more details on the calculation.

Simon Anders, Wolfgang Huber: Differential expression analysis for sequence count data. Genome Biology 2010, 11:106. http://dx.doi.org/10.1186/gb-2010-11-10-r106

`estimateSizeFactorsForMatrix`

dds <- makeExampleDESeqDataSet(n=1000, m=4) dds <- estimateSizeFactors(dds) sizeFactors(dds) dds <- estimateSizeFactors(dds, controlGenes=1:200) m <- matrix(runif(1000 * 4, .5, 1.5), ncol=4) dds <- estimateSizeFactors(dds, normMatrix=m) normalizationFactors(dds)[1:3,] geoMeans <- exp(rowMeans(log(counts(dds)))) dds <- estimateSizeFactors(dds,geoMeans=geoMeans) sizeFactors(dds)