cpm: Counts per Million or Reads per Kilobase per Million

Description

Computes counts per million (CPM) or reads per kilobase per million (RPKM) values.

Usage

"cpm"(x, normalized.lib.sizes=TRUE, log=FALSE, prior.count=0.25, ...)
"cpm"(x, lib.size=NULL, log=FALSE, prior.count=0.25, ...)
"rpkm"(x, gene.length=NULL, normalized.lib.sizes=TRUE, log=FALSE, prior.count=0.25, ...)
"rpkm"(x, gene.length, lib.size=NULL, log=FALSE, prior.count=0.25, ...)

Arguments

matrix of counts or a DGEList object

normalized.lib.sizes

logical, use normalized library sizes?

lib.size

library size, defaults to colSums(x).

log

logical, if TRUE then log2 values are returned.

prior.count

average count to be added to each observation to avoid taking log of zero. Used only if log=TRUE.

gene.length

vector of length nrow(x) giving gene length in bases, or the name of the column x$genes containing the gene lengths.

...

other arguments that are not currently used.

Value

Details

CPM or RPKM values are useful descriptive measures for the expression level of a gene. By default, the normalized library sizes are used in the computation for DGEList objects but simple column sums for matrices.

If log-values are computed, then a small count, given by prior.count but scaled to be proportional to the library size, is added to x to avoid taking the log of zero.

The rpkm method for DGEList objects will try to find the gene lengths in a column of x$genes called Length or length. Failing that, it will look for any column name containing "length" in any capitalization.

Examples

Run this code

y <- matrix(rnbinom(20,size=1,mu=10),5,4)
cpm(y)

d <- DGEList(counts=y, lib.size=1001:1004)
cpm(d)
cpm(d,log=TRUE)

d$genes$Length <- c(1000,2000,500,1500,3000)
rpkm(d)