Learn R Programming

csaw (version 1.6.1)

clusterWindows: Cluster DB windows into clusters

Description

Clusters significant windows into clusters while controlling the cluster-level FDR.

Usage

clusterWindows(regions, tab, target, pval.col=NULL, fc.col=NA, tol, ..., weight=NULL, grid.param=NULL)

Arguments

regions
a GRanges or RangedSummarizedExperiment object containing window coordinates
tab
a dataframe of results with a PValue field for each window
target
a numeric scalar indicating the desired cluster-level FDR
pval.col
a string or integer scalar specifying the column of tab with the p-values
fc.col
a string or integer scalar specifying the column of tab with the log-fold changes
tol, ...
arguments to be passed to mergeWindows
weight, grid.param
arguments to be passed to controlClusterFDR

Value

A named list similar to that reported by mergeWindows with an ID vector in id and region coordinates of each cluster in region. Non-significant windows are marked with NA values in ids. An additional element FDR is also included, representing the estimate of the cluster-level FDR for the returned regions.

Details

Windows are identified as DB based on the adjusted p-values in tab. Only these DB windows are then used directly for clustering via mergeWindows. This identifies DB regions consisting solely of DB windows. If tol is not specified, it is set to 100 bp by default and a warning is raised. If fc.col is used to specify the column of log-fold changes, clusters are formed according to the sign of the log-fold change in mergeWindows.

DB-based clustering is obviously not blind to the DB status, so standard methods for FDR control are not valid. Instead, post-hoc control of the cluster-level FDR is applied by using controlClusterFDR. This aims to control the cluster-level FDR at target (which is set to 0.05 if not specified). The aim is to provide some interpretable results when DB-blind clustering is not appropriate, e.g., for diffuse marks involving long stretches of the genome. Reporting each marked stretch in its entirety would be cumbersome, so this method allows the DB subintervals to be identified directly.

See Also

mergeWindows, controlClusterFDR

Examples

Run this code
set.seed(10)
x <- round(runif(100, 100, 1000))
gr <- GRanges("chrA", IRanges(x, x+5))
tab <- data.frame(PValue=rbeta(length(x), 1, 50), logFC=rnorm(length(x)))

clusterWindows(gr, tab, target=0.05, tol=10)
clusterWindows(gr, tab, target=0.05, tol=10, fc.col="logFC")

Run the code above in your browser using DataLab