Learn R Programming

TopDom (version 0.10.1)

overlapScores: Calculates Overlap Scores Between Two Sets of Topological Domains

Description

Calculates Overlap Scores Between Two Sets of Topological Domains

Usage

overlapScores(a, reference, debug = getOption("TopDom.debug", FALSE))

Arguments

a, reference

Topological domain (TD) set \(A\) and TD reference set \(R\) both in a format as returned by TopDom().

debug

If TRUE, debug output is produced.

Value

Returns a named list of class TopDomOverlapScores, where the names correspond to the chromosomes in domain reference set \(R\). Each of these chromosome elements contains a data.frame with fields:

  • chromosome - \(D_{R,c}\) character strings

  • best_score - \(D_{R,c}\) numerics in \([0,1]\)

  • best_length - \(D_{R,c}\) positive integers

  • best_set - list of \(D_{R,c}\) index vectors

where \(D_{R,c}\) is the number of TDs in reference set \(R\) on chromosome \(c\). If a TD in reference \(R\) is not a "domain", then the corresponding best_score and best_length values are NA_real_ and NA_integer_, respectively, while best_set is an empty list.

Warning - This might differ not be the correct implementation

The original TopDom scripts do not provide an implementation for calculating overlap scores. Instead, the implementation of TopDom::overlapScores() is based on the textual description of overlap scores provided in Shin et al. (2016). It is not known if this is the exact same algorithm and implementation as the authors of the TopDom article used.

Details

The overlap score, \(overlap(A', r_i)\), represents how well a consecutive subset \(A'\) of topological domains (TDs) in \(A\) overlap with topological domain \(r_i\) in reference set \(R\). For each reference TD \(r_i\), the best match \(A'_max\) is identified, that is, the \(A'\) subset that maximize \(overlap(A', r_i)\). For exact definitions, see Page 8 in Shin et al. (2016).

Note that the overlap score is an asymmetric score, which means that overlapScores(a, b) != overlapScores(b, a).

References

  • Shin et al., TopDom: an efficient and deterministic method for identifying topological domains in genomes, Nucleic Acids Research, 44(7): e70, April 2016. doi: 10.1093/nar/gkv1505, PMCID: PMC4838359, PMID: 26704975

See Also

TopDom.

Examples

Run this code
# NOT RUN {
library(tibble)
path <- system.file("exdata", package = "TopDom", mustWork = TRUE)

## Original count data (on a subset of the bins to speed up example)
chr <- "chr19"
pathname <- file.path(path, sprintf("nij.%s.gz", chr))
data <- readHiC(pathname, chr = chr, binSize = 40e3, bins = 1:500)
print(data)

## Find topological domains using TopDom method for two window sizes
tds_5 <- TopDom(data, window.size = 5L)
tds_6 <- TopDom(data, window.size = 6L)

## Overlap scores (in both directions)
overlap_56 <- overlapScores(tds_6, reference = tds_5)
print(overlap_56)
print(as_tibble(overlap_56))

overlap_65 <- overlapScores(tds_5, reference = tds_6)
print(overlap_65)
print(as_tibble(overlap_65))

# }

Run the code above in your browser using DataLab