windowCounts(bam.files, spacing=50, width=spacing, ext=100, shift=0,
filter=10, bin=FALSE, param=readParam())
readParam
object containing read extraction parameters, or a list of such objects (one for each BAM file)RangedSummarizedExperiment
object is returned containing one integer matrix.
Each entry of the matrix contains the count for each library (column) at each window (row).
The coordinates of each window are stored as the rowRanges
.
The total number of reads in each library are stored as totals
in the colData
, along with the read extension length and param
used in each library.
Other window counting parameters (e.g., spacing
, width
) are stored in the metadata
.
width
.
The value of width
can be interpreted as the width of the contact area between the DNA and protein.
In practical terms, it determines the spatial resolution of the analysis. Larger windows count reads over a larger region which results in larger counts.
This results in greater detection power at the cost of resolution. By default, the first window on a chromosome starts at base position 1.
This can be shifted to the left by specifying an appropriate value for shift
.
New windows are found by sliding the current window to the right by the specified spacing
.
Increasing spacing
will reduce the frequency at which counts are extracted from the genome.
This results in some loss of resolution but it may be necessary when machine memory is limited. If bin
is set, settings are internally adjusted so that all reads are counted into non-overlapping adjacent bins of size width
.
Specifically, spacing
is set to width
and filter
is capped at a maximum value of 1 (empty bins can be retained with filter=0
).
Only the 5' end of each read or the midpoint of each fragment (for paired-end data) is used in counting.param
argument.
Specifying a single readParam
object will use the same extraction parameters for all files.
Different parameters can also be used for each file by specifying a list of readParam
objects.
However, users should take care with library-specific parameters, lest spurious differences be introduced between libraries. Fragments are inferred from reads by directional extension (single-end; see below) or by identifying proper pairs (paired-end; see readParam
for more details).
The number of fragments overlapping the window for each library is then counted for each window position.
Windows will be removed if the count sum across all libraries is below filter
.
This reduces the memory footprint of the output by not returning empty or near-empty windows, which are usually uninteresting anyway. The strandedness of the output rowRanges
is set based on the strand(s) from which the reads are extracted and counted.
This is determined by the value of the forward
slot in the input param
object.ext
.
This obtains a rough estimate of the interval of the fragment from which the read was derived.
It is particularly useful for TF data, where extension specifically increases the coverage of peaks that exhibit strand bimodality.
Substantially different fragment lengths between libraries can be accommodated by supplying a vector to ext
, where each entry represents the extension length for the corresponding library.
If any value of ext
is set to NA
, the read length is used as the fragment length in that library. However, different fragment lengths will result in different peak widths between libraries.
This may result in the detection of irrelevant differences corresponding to these differences in widths.
To avoid this, fragment lengths in all libraries can be scaled to final.ext
.
For a bimodal peak, scaling effectively aligns the subpeaks on a given strand across all libraries to a common location.
This removes the most obvious differences in widths. The value of final.ext
can be specified by passing a list (or equivalently subsettable object) of integer vectors in the ext
argument.
The library-specific fragment lengths are taken from the first element of the list, while the final fragment length is defined from the second element.
To this end, the second element should be either a scalar or a vector containing a single duplicated value.
If final.ext
is set to NA
, no rescaling is performed from the library-specific fragment lengths.ext
are not used directly.
Hwoever, rescaling can still be performed to standardize fragment lengths across libraries, by resizing each fragment from its midpoint.
This will use the second element of ext
as final.ext
, if ext
is specified as a list of length 2. On a similar note, some downstream functions will use the extension length in the output colData
as the average fragment length.
Thus, to maintain compatibility, the extension length in the output colData
is set to the average of the inferred fragment lengths for valid read pairs.
These values will not be used in windowCounts
, but instead, in functions like getWidths
.correlateReads
,
readParam
# A low filter is only used here as the examples have very few reads.
bamFiles <- system.file("exdata", c("rep1.bam", "rep2.bam"), package="csaw")
windowCounts(bamFiles, filter=1)
windowCounts(bamFiles, width=100, filter=1)
windowCounts(bamFiles, ext=c(50, 100), spacing=100, filter=1)
windowCounts(bamFiles, ext=DataFrame(c(50, 100), 80), width=100)
# Loading PE data.
bamFile <- system.file("exdata", "pet.bam", package="csaw")
windowCounts(bamFile, param=readParam(pe="both"), filter=1)
windowCounts(bamFile, param=readParam(pe="first"), filter=1)
windowCounts(bamFile, param=readParam(max.frag=100, pe="both"), filter=1)
windowCounts(bamFile, param=readParam(max.frag=100, pe="both", restrict="chrA"), filter=1)
Run the code above in your browser using DataLab