This is a prototype function for importing information about changes in
the average transcript length for each gene. The use of this function
is only for testing purposes.The function simply imports or calculates average
transcript length for each gene and each sample from external files,
and provides this matrix to the normMatrix
argument of
estimateSizeFactors
.
By average transcript length, the average refers to a weighted average with respect
to the transcript abundances. The RSEM method includes such a column in their
gene.results
files, but an estimate of average transcript length can
be obtained from any software which outputs a file with a row for each
transcript, specifying: transcript length, estimate of transcript abundance,
and the gene which the transcript belongs to.
Normalization factors accounting for both average transcript length and
library size of each sample are generated and then stored within the data object.
The analysis can then continue with DESeq
;
the stored normalization factors will be used instead of size factors in the analysis.
For RSEM genes.results
files,
specify level="gene"
, geneIdCol="gene_id"
,
and lengthCol="effective_length"
For Cufflinks isoforms.fpkm_tracking
files,
specify level="tx"
, geneIdCol="gene_id"
,
lengthCol="length"
, and abundanceCol="FPKM"
.
For Sailfish output files, one can write an importer
function which attaches a column gene_id
based on Transcript ID,
and then specify level="tx"
, geneIdCol="gene_id"
,
lengthCol="Length"
and abundanceCol="RPKM"
.
Along with the normalization matrix which is stored in normalizationFactors(object)
,
the resulting gene length matrix is stored in assays(dds)[["avgTxLength"]]
,
and will take precedence in calls to fpkm
.