stallingIndex: returns a list with average read count on TSS, gene body, and stalling index for a number of samples

Description

based on a TxDb, and a list of Pol2 ChIP-seq samples with a GRange corresponding to the peak call, returns a list whose elements contain the stalling index and average read count on TSS and gene body for a user-defined list of genes

Usage

stallingIndex(BAMlist,inputList=NULL, peakGRlist, peakGB=FALSE,  genesList, transcriptDB, countMode="transcript", upstream=300,  downstream=300, cutoff=600, elongationOffset=0)

Arguments

BAMlist

List of characters; paths to BAM files containing ChIP-seq aligned reads

inputList

List of characters (optional); paths to BAM files containing ChIP-seq aligned reads of inputs relative to the ChIP-seq samples in BAMlist

peakGRlist

List of GRanges; Pol2 peaks relative to the ChIP-seq samples in BAMlist. Only genes with a peak on the TSS will be used for counting

peakGB

logical; whether or not to consider only genes which have also a Pol2 peak on the gene body. Deafult=FALSE

genesList

List of characters; Entrez gene IDs defining the genes where the stalling index will be computed

transcriptDB

An object of class transcriptDb

countMode

either 'transcript' or 'average' or 'gene'. Genomic units used to count reads: 'transcript' considers all transcripts separately, 'average' averages over different isoforms of a gene, 'gene' considers the longest isoform only. Default='transcript'

upstream

numeric; upstream end of the TSS interval where reads will be counted. Default=300

downstream

numeric; downstream end of the TSS interval where reads will be counted. Default=300

cutoff

numeric; minimum length required for a gene to be considered for the counts. Default=600

elongationOffset

numeric; downstream end of the gene body interval where reads will be counted. Default=0

Value

Details

Given a set of Pol2 ChIP-seq samples, computes the average base coverage on TSS and gene body and their ratio (Pol2 stalling index) computed over specific sets of genes for a list of samples. For each sample, reads will be counted only for genes contained in the corresponding element of the genesList, and only for those genes having a Pol2 peak (given in peakGRlist as list of GRanges) on the TSS region (defined through upstream and downstream). If peakGB is set to TRUE, a Pol2 peak on the body of a gene will be required. The lists of genes peakGRlist must be provided in terms of entrezIDs. Reads can be counted on individual gene isoforms (countMode='transcript'), averaging over isoforms (countMode='average') or taking the longest isoform for each gene (countMode='gene'). The read counts will be performed on the intervals [TSS-upstream, TSS+downstream] (TSS) and [TSS+downstream, TES+elongationOffset] (genebody), where upstream, downstream, elongationOffset are user-defined parameters. Moreover, only genes having a length bigger than the user-defined parameter cutoff will be considered. BAM files have to be associated to the corresponding index .bai files. Please refer to the documentation of samtools on how to create them.

References

http://genomics.iit.it/groups/computational-epigenomics.html

Examples

Run this code

  require(TxDb.Mmusculus.UCSC.mm9.knownGene)
  require(org.Mm.eg.db)
  txdb <- TxDb.Mmusculus.UCSC.mm9.knownGene
  isActiveSeq(txdb) <- c(rep(FALSE,18), TRUE, rep(FALSE, length(isActiveSeq(txdb))-19))
  # pointing to Pol2 BAM file
  # BAM file from the GEO GSM1234478 sample, limited to chr19:3200000-4000000
  Pol2bam <- system.file("extdata", "Pol2.bam", package="compEpiTools")
  # loading Pol2 peaks as a GRanges object
  # built based on the BED file from the GEO GSM1234478 sample
  # limited to chr19:3200000-4000000
  Pol2GR <- system.file("extdata", "Pol2GR.Rda", package="compEpiTools")
  load(Pol2GR)
  egs <- distanceFromTSS(Pol2GR, txdb=txdb)
  egs <- unique(egs$nearest_gene_id)
  SI_matrix <- stallingIndex(BAMlist=list(Pol2bam), peakGRlist=list(Pol2GR), 
    genesList=list(egs), transcriptDB=txdb, countMode='gene')
  restoreSeqlevels(txdb)

Run the code above in your browser using DataLab