Count Overlap of ATAC-seq Fragments
fragmentoverlapcount(
file,
targetregions,
excluderegions = NULL,
targetbarcodes = NULL,
Tn5offset = c(1, 0)
)A tibble with each row corresponding to a cell.
For each cell, its barcode, the total count of the fragments nfrag,
and the count distinguished by overlap depth are given.
Filename of the file for ATAC-seq fragments.
The file must be block gzipped (using the bgzip command)
and accompanied with the index file (made using the tabix command).
The uncompressed file must be a tab delimited file,
where each row represents one fragment.
The first four columns are chromosome name, start position, end position,
and barcode (i.e., name) of the cell including the fragment.
The remaining columns are ignored.
See vignette for details.
GRanges object for the regions where overlaps are counted.
Usually all of the autosomes.
If there is memory problem, split a chromosome into smaller chunks,
for example by 10 Mb.
The function loads each element of targetregions sequentially,
and smaller elements require less memory.
GRanges object for the regions to be excluded.
Simple repeats in the genome should be listed here,
because repeats can cause false overlaps.
A fragment is discarded if its 5' or 3' end is located in excluderegions.
If NULL, fragments are not excluded by this criterion.
Character vector for the barcodes of cells to be analyzed,
such as those passing quality control.
If NULL, all barcodes in the input file are analyzed.
Numeric vector of length two.
The enzyme for ATAC-seq is a homodimer of Tn5.
The transposition sites of two Tn5 proteins are 9 bp apart,
and the (representative) site of accessibility is in between.
If the start and end position of your input file is taken from BAM file,
set the paramater to c(4, -5) to adjust the offset.
Alternatively, values such as c(0, -9) could generate similar results;
what matters the most is the difference between the two numbers.
The fragments.tsv.gz file generated by 10x Cell Ranger already adjusts the shift
but is recorded as a BED file. In this case, use c(1, 0) (default value).
If unsure, set to "guess",
in which case the program returns a guess.