annotatePeakInBatch: obtain the distance to the nearest TSS, miRNA, exon et al for a list of peak intervals

Description

obtain the distance to the nearest TSS, miRNA, exon et al for a list of peak locations leveraging IRanges and biomaRt package

Usage

annotatePeakInBatch(myPeakList, mart, featureType = c("TSS", "miRNA","Exon"), 
AnnotationData,output=c("nearestStart", "overlapping","both", "shortestDistance"),multiple=c(TRUE,FALSE), 
maxgap=0,PeakLocForDistance = c("start", "middle", "end"), 
FeatureLocForDistance = c("TSS", "middle","start", "end","geneEnd"), select=c("all", "first","last","arbitrary"))

Arguments

myPeakList

RangedData: See example below

mart

used if AnnotationData not supplied, a mart object, see useMart of bioMaRt package for details

featureType

used if AnnotationData not supplied, TSS, miRNA or exon

AnnotationData

annotation data obtained from getAnnotation or customized annotation of class RangedData containing additional variable: strand (1 or + for plus strand and -1 or - for minus strand). For example, data(TSS.human.NCBI36),data(TSS.mouse.NCBIM37), data(TSS.rat.RGSC3.4) and data(TSS.zebrafish.Zv8) . If not supplied, then annotation will be obtained from biomaRt automatically using the parameters of mart and featureType

output

nearestStart (default): will output the nearest features calculated as peak start - feature start (feature end if feature resides at minus strand); overlapping: will output overlapping features with maximum gap specified as maxgap between peak range and feature range; both: will output all the nearest features, in addition, will output any features that overlap the peak that is not the nearest features; shortestDistance: will output all the nearest features with preference given to overlapping features.

multiple

not applicable when output is nearestStart. TRUE: output multiple overlapping features for each peak. FALSE: output at most one overlapping feature for each peak. This parameter is kept for backward compatibility, please use select.

maxgap

Non-negative integer. Intervals with a separation of maxgap or less are considered to be overlapping

PeakLocForDistance

Specify the location of peak for calculating distance,i.e., middle means using middle of the peak to calculate distance to feature, start means using start of the peak to calculate the distance to feature. To be compatible with previous version, by default using start

FeatureLocForDistance

Specify the location of feature for calculating distance,i.e., middle means using middle of the feature to calculate distance of peak to feature, start means using start of the feature to calculate the distance to feature, TSS means using start of feature when feature is on plus strand and using end of feature when feature is on minus strand, geneEnd means using end of feature when feature is on plus strand and using start of feature when feature is on minus strand. To be compatible with previous version, by default using TSS

select

all may return multiple overlapping peaks, first will return the first overlapping peak, last will return the last overlapping peak and arbitrary will return one of the overlapping peaks.

Value

feature: id of the feature such as ensembl gene ID
insideFeature: upstream: peak resides upstream of the feature; downstream: peak resides downstream of the feature; inside: peak resides inside the feature; overlapStart: peak overlaps with the start of the feature; overlapEnd: peak overlaps with the end of the feature; includeFeature: peak include the feature entirely
distancetoFeature: distance to the nearest feature such as transcription start site. By default, the distance is calculated as the distance between the start of the binding site and the TSS that is the gene start for genes located on the forward strand and the gene end for genes located on the reverse strand. The user can specify the location of peak and location of feature for calculating this
start_position: start position of the feature such as gene
end_position: end position of the feature such as the gene
strand: 1 or + for positive strand and -1 or - for negative strand where the feature is located
shortestDistance: The shortest distance from either end of peak to either end the feature.
fromOverlappingOrNearest: NearestStart: indicates this feature's start (feature's end for features at minus strand) is closest to the peak start; Overlapping: indicates this feature overlaps with this peak although it is not the nearest feature start

Details

References

Zhu L.J. et al. (2010) ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics 2010, 11:237doi:10.1186/1471-2105-11-237

Examples

Run this code


if (interactive())
{
	## example 1: annotate myPeakList (RangedData) with TSS.human.NCBI36 (RangedData)
	data(myPeakList)
	data(TSS.human.NCBI36)
	annotatedPeak = annotatePeakInBatch(myPeakList[1:6,], AnnotationData=TSS.human.NCBI36)
	as.data.frame(annotatedPeak)
	annotatedPeak = annotatePeakInBatch(myPeakList[1:6,], AnnotationData=TSS.human.NCBI36,
	    FeatureLocForDistance="TSS", PeakLocForDistance="middle", output="both" )
        
	## example 2: you have a list of transcription factor biding sites from literature and
	## are interested in determining the extent of the overlap to the list of peaks from 
	## your experiment. Prior calling the function annotatePeakInBatch, need to represent
	## both dataset as RangedData where start is the start of the binding site, end is 
	## the end of the binding site, names is the name of the binding site, 
	## space and strand are the chromosome name and strand where the binding site is located.

myexp =  RangedData(IRanges(start=c(1543200,1557200,1563000,1569800,167889600,100,1000),
	end=c(1555199,1560599,1565199,1573799,167893599,200,1200),
names=c("p1","p2","p3","p4","p5","p6", "p7")),strand=as.integer(1),space=c(6,6,6,6,5,4,4))
literature = RangedData(IRanges(start=c(1549800,1554400,1565000,1569400,167888600,120,800),
	end=c(1550599,1560799,1565399,1571199,167888999,140,1400),
names=c("f1","f2","f3","f4","f5","f6","f7")),strand=c(1,1,1,1,1,-1,-1),space=c(6,6,6,6,5,4,4))
	annotatedPeak1= annotatePeakInBatch(myexp, AnnotationData = literature)
	pie(table(as.data.frame(annotatedPeak1)$insideFeature))
	as.data.frame(annotatedPeak1)
	### use BED2RangedData or GFF2RangedData to convert BED format or GFF format
	###  to RangedData before calling annotatePeakInBatch
	test.bed = data.frame(cbind(chrom = c("4", "6"), chromStart=c("100", "1000"),
	chromEnd=c("200", "1100"), name=c("peak1", "peak2")))
	test.rangedData = BED2RangedData(test.bed)
	annotatePeakInBatch(test.rangedData, AnnotationData = literature)
	test.GFF = data.frame(cbind(seqname  = c("chr4", "chr4"), source=rep("Macs", 2), 
feature=rep("peak", 2), start=c("100", "1000"), end=c("200", "1100"), 
score=c(60, 26), strand=c(1, 1), frame=c(".", 2), group=c("peak1", "peak2")))
	test.rangedData = GFF2RangedData(test.GFF)
	as.data.frame(annotatePeakInBatch(test.rangedData, AnnotationData = literature))
}

Run the code above in your browser using DataLab