fpkm(object, robust = TRUE)
GRangesof the mcols(object), and per million of mapped fragments, either using the robust median ratio method (robust=TRUE, default) or using raw counts (robust=FALSE). Defining a column
mcols(object)$basepairstakes precedence over internal calculation of the kilobases for each row.
assays(dds), this will take precedence in the length normalization. This occurs when using the tximport-DESeq2 pipeline. (2) Otherwise, feature length is calculated from the
rowRangesof the dds object, if a column
basepairsis not present in
mcols(dds). The calculated length is the number of basepairs in the union of all
GRangesassigned to a given row of
object, e.g., the union of all basepairs of exons of a given gene. Note that the second approach over-estimates the gene length (average transcript length, weighted by abundance is a more appropriate normalization for gene counts), and so the FPKM will be an underestimate of the true value.
Note that, when the read/fragment counting has inter-feature dependencies, a strict normalization would not incorporate the basepairs of a feature which overlap another feature. This inter-feature dependence is not taken into consideration in the internal union basepair calculation.
# create a matrix with 1 million counts for the # 2nd and 3rd column, the 1st and 4th have # half and double the counts, respectively. m <- matrix(1e6 * rep(c(.125, .25, .25, .5), each=4), ncol=4, dimnames=list(1:4,1:4)) mode(m) <- "integer" se <- SummarizedExperiment(list(counts=m), colData=DataFrame(sample=1:4)) dds <- DESeqDataSet(se, ~ 1) # create 4 GRanges with lengths: 1, 1, 2, 2.5 Kb gr1 <- GRanges("chr1",IRanges(1,1000)) # 1kb gr2 <- GRanges("chr1",IRanges(c(1,1001),c( 500,1500))) # 1kb gr3 <- GRanges("chr1",IRanges(c(1,1001),c(1000,2000))) # 2kb gr4 <- GRanges("chr1",IRanges(c(1,1001),c(200,1300))) # 500bp rowRanges(dds) <- GRangesList(gr1,gr2,gr3,gr4) # the raw counts counts(dds) # the FPM values fpm(dds) # the FPKM values fpkm(dds)