In order to use XBSeq for testing DE, we need to run HTSeq twice to measure the reads mapped to exonic regions (observed signal) and
non-exonic regions (background noise). Firstly, we need to construct the gtf annotation file to measure the background noise:
- Download refFlat table from UCSC database (http://genome.ucsc.edu) and create the preliminary list of gene-free regions,
- Download tables of (a) all_mrna; (b) ensGene; (c) pseudoYale60Gene; (d) vegaGene;, (e)xenoMrna, and (f) xenoRefGene from UCSC database and remove
regions appear in any of them from the gene-free regions,
- To guarantee gene-free regions are far enough from exonic regions, trim 100 bps from both sides of intronic regions and 1,000 bps from both sides
of inter-genic regions,
- Shift each exon of a gene to the right nearest gene-free region. Most of the shifted genes remain the same as the original structures of the genes,
- If the nearby gene-free region is too short, we may only preserve the exon size features but not the whole gene structure. The priority of shifting
a region is: i) nearest right gene-free region, 2) nearest left gene-free region; 3) the second right nearest gene-free region and so on until the shift
region of the original exon fits, and
- Shift each exon of a gene to the right nearest gene-free region. Most of the shifted genes remain the same as the original structures of the genes,
- At last, we considered the shifted regions as the non-exonic regions for each gene and a final .gtf file was created
We carried out HTSeq procedure twice by using a a mouse RNA-seq dataset, which contains 3 replicates of wild type mouse liver tissues (WT) and 3
replicates of Myc transgenic mouse liver tissues (MYC).The dataset is obtained from Gene Expression Omnibus (GSE61875)
(http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE61875) . The two datasets can be loaded via data(ExampleData)
after loading the XBSeq library.
The annotation for measuring the background noise can be generated by following the previous steps. Firstly, generate preliminary gene-free regions by calling
the function
exonFreeRegionShift.pl <-EX exon-GTF file > <-FR gene free region>
.
Then remove the potential functional elements by calling the function
GEFRshift.pl <-G gene-GTF.gtf > <-I intronRegion.tsv> <-T integenicRegion.tsv> optional: -m mRNA.bed -x xenoMrna.bed -z xenoRefGene.bed -e ensGene.bed
-p pseudoGene.bed -v vegaGene.bed -b
.
We have already generated gtf files for human (hg18 and hg19) and mouse (mm9 and mm10) and deposited in github. If you
would like to generate your own gtf files, the scripts to generate the files ,which are written in perl, are available in the package subfolder XBSeq\inst\scripts\.
The scripts are also deposited in github (https://github.com/Liuy12/XBSeq).