normalizeCounts(counts, group=rep.int(1,ncol(counts)), method=c("TMM", "cqn"), common.disp = FALSE, prior.df=8, annot=NULL, lib.sizes=NULL, verbose=TRUE)
method="cqn"
.
method="TMM"
.
estimateTagwiseDisp
which defines the prior degrees of freedom. It is used in
calculating 'prior.n' which, in turn, defines the amount of shrinkage of the estimated tagwise
dispersions to the common one. By default prior.df=8
thus
assumming no shrinkage toward that common dispersion. This
argument is not used if common.disp=TRUE
. This argument is only relevant
when method="TMM"
.
counts
input matrix, containing feature/tag/gene lengths in bp on its first column,
and a second covariate, such as G+C content, on its second column. These two pieces
of information are provided to arguments lengths
and x
when calling
cqn
. This argument is only relevant when method="TMM"
.
lib.sizes=NULL
(default) then these quantities are estimated as the column
sums in the input matrix of counts.
edgeR
and cqn
packages
in order to try
to remove systematic technical effects from raw counts.By default,
the
TMM method described in Robinson and Oshlack (2010) is employed
to calculate normalization factors which are applied to
estimate effective library sizes, then common and tagwise
(only when the
argument common.disp=TRUE) dispersions are calculated
(Robinson and Smyth,
Bioinformatics 2007) and finally counts are adjusted so
that library sizes
are approximately equal for the given dispersion values
(Robinson and
Smyth, Biostatistics 2008).Setting the argument
method="cqn"
, conditional
quantile normalization (Hansen, Irizarry and Wu,
2012) is applied which aims at
adjusting for tag/feature/gene length and other
covariate such as G+C content. This
information should be provided through the
annot
argument. This procedure
calculates, for every gene and every sample,
an offset to apply to the log2 reads per
million (RPM) and the function
normalizeCounts()
adds this offset to the
the log2 RPM values calculated from the
input count data matrix, unlogs them and rolls
back these normalized RPM values into
integer counts. Details on these two normalization
procedures are given in the
documentation of the edgeR
and cqn
Bioconductor
packages.
M.D. Robinson and A. Oshlack. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol, 11:R25, 2010.
Robinson MD and Smyth GK (2007). Moderated statistical tests for assessing differences in tag abundance. _Bioinformatics_ 23, 2881-2887
Robinson MD and Smyth GK (2008). Small-sample estimation of negative binomial dispersion, with applications to SAGE data. _Biostatistics_, 9, 321-332
filterCounts
# Generate a random matrix of counts
counts <- matrix(rPT(n=1000, a=0.5, mu=10, D=5), ncol = 40)
colSums(counts)
counts[1:5, 1:5]
# Normalize counts
normCounts <- normalizeCounts(counts, rep(c(1,2), 20))
colSums(normCounts)
normCounts[1:5, 1:5]
Run the code above in your browser using DataLab