fpkm: FPKM: fragments per kilobase per million mapped fragments

Description

The following function returns fragment counts normalized per kilobase of feature length per million mapped fragments (by default using a robust estimate of the library size, as in estimateSizeFactors).

Usage

fpkm(object, robust = TRUE)

Arguments

object

a DESeqDataSet

robust

whether to use size factors to normalize rather than taking the column sums of the raw counts, using the fpm function.

Value

a matrix which is normalized per kilobase of the union of basepairs in the GRangesList or GRanges of the mcols(object), and per million of mapped fragments, either using the robust median ratio method (robust=TRUE, default) or using raw counts (robust=FALSE). Defining a column mcols(object)$basepairs takes precedence over internal calculation of the kilobases for each row.

Details

The length of the features (e.g. genes) is calculated one of two ways: (1) If there is a matrix named "avgTxLength" in assays(dds), this will take precedence in the length normalization. This occurs when using the tximport-DESeq2 pipeline. (2) Otherwise, feature length is calculated from the rowRanges of the dds object, if a column basepairs is not present in mcols(dds). The calculated length is the number of basepairs in the union of all GRanges assigned to a given row of object, e.g., the union of all basepairs of exons of a given gene. Note that the second approach over-estimates the gene length (average transcript length, weighted by abundance is a more appropriate normalization for gene counts), and so the FPKM will be an underestimate of the true value.

Note that, when the read/fragment counting has inter-feature dependencies, a strict normalization would not incorporate the basepairs of a feature which overlap another feature. This inter-feature dependence is not taken into consideration in the internal union basepair calculation.

Examples

Run this code


# create a matrix with 1 million counts for the
# 2nd and 3rd column, the 1st and 4th have
# half and double the counts, respectively.
m <- matrix(1e6 * rep(c(.125, .25, .25, .5), each=4),
            ncol=4, dimnames=list(1:4,1:4))
mode(m) <- "integer"
se <- SummarizedExperiment(list(counts=m), colData=DataFrame(sample=1:4))
dds <- DESeqDataSet(se, ~ 1)

# create 4 GRanges with lengths: 1, 1, 2, 2.5 Kb
gr1 <- GRanges("chr1",IRanges(1,1000)) # 1kb
gr2 <- GRanges("chr1",IRanges(c(1,1001),c( 500,1500))) # 1kb
gr3 <- GRanges("chr1",IRanges(c(1,1001),c(1000,2000))) # 2kb
gr4 <- GRanges("chr1",IRanges(c(1,1001),c(200,1300))) # 500bp
rowRanges(dds) <- GRangesList(gr1,gr2,gr3,gr4)

# the raw counts
counts(dds)

# the FPM values
fpm(dds)

# the FPKM values
fpkm(dds)