squareCounts: Load Hi-C interaction counts

Description

Collate count combinations for interactions between pairs of bins across multiple Hi-C libraries.

Usage

squareCounts(files, param, width=50000, filter=1L)

Arguments

files

a character vector containing paths to the index files generated from each Hi-C library

param

a pairParam object containing read extraction parameters

width

an integer scalar specifying the width of each bin in base pairs

filter

an integer scalar specifying the minimum count for each square

Value

An InteractionSet object is returned containing the number of read pairs for each bin pair across all libraries. Bin pairs are stored as a ReverseStrictGInteractions object.

Details

The genome is first split into non-overlapping adjacent bins of size width. These bins are rounded to the nearest restriction site, forming the sides of potential squares. The number of restriction fragments in each bin is stored as nfrags in the metadata of the output region. Each square represents an interaction between the corresponding rounded bins on the genome. The number of read pairs between each pair of sides is counted for each library to obtain the count for the corresponding square.

Larger counts can be collected by increasing the value of width. This can improve detection power by increasing the evidence for significant differences. However, this comes at the cost of spatial resolution as adjacent events in the same bin or square can no longer be distinguished. This may reduce detection power if counts for differential interactions are contaminated by counts for non-differential interactions.

Low abundance squares with count sums below filter are not reported. This reduces memory usage for large datasets. These squares are probably uninteresting as detection power will be poor for low counts. Another option is to increase width to reduce the total number of bins in the genome (and hence, the possible number of bin pairs).

Counting will consider the values of restrict, discard and cap in param. See pairParam for more details.

References

Imakaev M et al. (2012). Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999-1003.

Lieberman-Aiden E et al. (2009). Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science 326, 289-293.

Examples

Run this code

hic.file <- system.file("exdata", "hic_sort.bam", package="diffHic")
cuts <- readRDS(system.file("exdata", "cuts.rds", package="diffHic"))
param <- pairParam(fragments=cuts)

# Setting up the parameters
fout <- "output.h5"
invisible(preparePairs(hic.file, param, file=fout))

# Collating to count combinations.
y <- squareCounts(fout, param)
head(assay(y))
y <- squareCounts(fout, param, filter=1)
head(assay(y))
y <- squareCounts(fout, param, width=50, filter=1)
head(assay(y))
y <- squareCounts(fout, param, width=100, filter=1)
head(assay(y))

# Attempting with other parameters.
y <- squareCounts(fout, reform(param, restrict="chrA"), width=100, filter=1)
head(assay(y))
y <- squareCounts(fout, filter=1,
    param=reform(param, restrict=cbind("chrA", "chrB"))) 
head(assay(y))
y <- squareCounts(fout, filter=1,
    param=reform(param, cap=1), width=100)
head(assay(y))
y <- squareCounts(fout, width=100, filter=1, 
    param=reform(param, discard=GRanges("chrA", IRanges(1, 50))))
head(assay(y))

Run the code above in your browser using DataLab