clusterWindows: Cluster DB windows into clusters

Description

Clusters significant windows into clusters while controlling the cluster-level FDR.

Usage

clusterWindows(regions, tab, target, pval.col=NULL,  fc.col=NA, tol, ..., weight=NULL, grid.param=NULL)

Arguments

regions

a GRanges or RangedSummarizedExperiment object containing window coordinates

tab

a dataframe of results with a PValue field for each window

target

a numeric scalar indicating the desired cluster-level FDR

pval.col

a string or integer scalar specifying the column of tab with the p-values

fc.col

a string or integer scalar specifying the column of tab with the log-fold changes

tol, ...

arguments to be passed to mergeWindows

weight, grid.param

arguments to be passed to controlClusterFDR

Value

A named list similar to that reported by mergeWindows with an ID vector in id and region coordinates of each cluster in region. Non-significant windows are marked with NA values in ids. An additional element FDR is also included, representing the estimate of the cluster-level FDR for the returned regions.

Details

Windows are identified as DB based on the adjusted p-values in tab. Only these DB windows are then used directly for clustering via mergeWindows. This identifies DB regions consisting solely of DB windows. If tol is not specified, it is set to 100 bp by default and a warning is raised. If fc.col is used to specify the column of log-fold changes, clusters are formed according to the sign of the log-fold change in mergeWindows.

DB-based clustering is obviously not blind to the DB status, so standard methods for FDR control are not valid. Instead, post-hoc control of the cluster-level FDR is applied by using controlClusterFDR. This aims to control the cluster-level FDR at target (which is set to 0.05 if not specified). The aim is to provide some interpretable results when DB-blind clustering is not appropriate, e.g., for diffuse marks involving long stretches of the genome. Reporting each marked stretch in its entirety would be cumbersome, so this method allows the DB subintervals to be identified directly.

Examples

Run this code

set.seed(10)
x <- round(runif(100, 100, 1000))
gr <- GRanges("chrA", IRanges(x, x+5))
tab <- data.frame(PValue=rbeta(length(x), 1, 50), logFC=rnorm(length(x)))

clusterWindows(gr, tab, target=0.05, tol=10)
clusterWindows(gr, tab, target=0.05, tol=10, fc.col="logFC")