Learn R Programming

csaw (version 1.6.1)

profileSites: Profile binding sites

Description

Get the coverage profile around potential binding sites.

Usage

profileSites(bam.files, regions, range=5000, ext=100, average=TRUE, weight=1, param=readParam(), strand=c("ignore", "use", "match"))

Arguments

bam.files
a character vector containing paths to one or more BAM files
regions
a GRanges object over which profiles are to be aggregated
range
an integer scalar specifying the range over which the profile will be collected
ext
an integer scalar specifying the average fragment length for single-end data
average
a logical scalar specifying whether the profiles should be averaged across regions
weight
a numeric vector indicating the relative weight to be assigned to the profile for each region
param
a readParam object containing read extraction parameters, or a list of such objects (one for each BAM file)
strand
a string indicating how stranded regions should be handled

Value

If average=TRUE, a numeric vector of average coverages for each base position within range is returned, where the average is taken over all regions. The vector is named according to the relative position of each base to the start of the region. If weight is set as described above for each region, then the vector will represent average relative coverages, i.e., relative to the number of fragments counted in the region itself.If average=FALSE, an integer matrix of coverage values is returned. Each row of the matrix corresponds to an entry in regions, while each column corresponds to a base position with range. Column names are set o the relative position of each base to the start of each region.

Comments on strand specificity

If strand="use", the function is called separately on the reverse-stranded regions. The profile for these regions is computed such that the left side of the profile corresponds to the upstream flank on the reverse strand (i.e., the profile is flipped). The center of the profile corresponds to the 5' end of the region on the reverse strand. This may be useful for features where strandedness is important, e.g., TSS's. Otherwise, if strand="ignore", no special treatment is given to reverse-stranded features. By default, the strandedness of the region has no effect on read extraction. If strand="match", the profile for reverse-strand regions is made with reverse-strand reads only (this profile is also flipped, as described for strand="use"). Similarly, only forward-strand reads are used for forward- or unstranded regions. Note that param$forward must be set to NULL for this to work.

Details

This function aggregates the coverage profile around the specified regions. The shape of this profile can guide an intelligent choice of the window size in windowCounts, or to determine if region expansion is necessary in regionCounts. For the former, restricting the regions to locally maximal windows with findMaxima is recommended prior to use of profileSites. The function can be also used to examine average coverage around known features of interest, like genes.

The profile records the number of fragments overlapping each base within range of the start of all regions. Single-end reads are directionally extended to ext to impute the fragment (see windowCounts for more details). For paired-end reads, the interval between each pair is used as the fragment. If multiple bam.files are specified, reads are pooled across files for counting into each profile.

Direct aggregation will favor high-abundance regions as these have higher counts. If this is undesirable, high-abundance regions can be downweighted using the weight argument. For example, this can be set to the inverse of the sum of counts across all libraries for each region in regions. This will ensure that each region contributes equally to the final profile.

Aggregation can be turned off by setting average=FALSE. In such cases, a separate profile will be returned for each region, instead of the profiles being averaged across all regions. This may be useful, e.g., for constructing heatmaps of enrichment across many regions. Note that weight has no effect when aggregation is turned off.

See Also

findMaxima, windowCounts, wwhm

Examples

Run this code
bamFile <- system.file("exdata", "rep1.bam", package="csaw")
data <- windowCounts(bamFile, filter=1)
rwsms <- rowSums(assay(data))
maxed <- findMaxima(rowRanges(data), range=100, metric=rwsms)
	
x <- profileSites(bamFile, rowRanges(data)[maxed], range=200)
plot(as.integer(names(x)), x)

x <- profileSites(bamFile, rowRanges(data)[maxed], range=500)
plot(as.integer(names(x)), x)

x <- profileSites(bamFile, rowRanges(data)[maxed], range=500, weight=1/rwsms)
plot(as.integer(names(x)), x)

# Introducing some strandedness.
regs <- rowRanges(data)[maxed]
strand(regs) <- sample(c("-", "+", "*"), sum(maxed), replace=TRUE)
x <- profileSites(bamFile, regs, range=500)
plot(as.integer(names(x)), x)
x2 <- profileSites(bamFile, regs, range=500, strand="use")
points(as.integer(names(x2)), x2, col="red")
x3 <- profileSites(bamFile, regs, range=500, strand="match",
    param=readParam(forward=NULL))
points(as.integer(names(x3)), x3, col="blue")

# Returning separate profiles.
y <- profileSites(bamFile, rowRanges(data)[maxed], range=500, average=FALSE)
dim(y)

Run the code above in your browser using DataLab