This function uses Simes' procedure to compute the combined p-value for each cluster of tests with the same value of ids
.
Each combined p-value represents evidence against the global null hypothesis, i.e., all individual nulls are true in each cluster.
This may be more relevant than examining each test individually when multiple tests in a cluster represent parts of the same underlying event, e.g., genomic regions consisting of clusters of windows.
The BH method is also applied to control the FDR across all clusters.The importance of each test within a cluster can be adjusted by supplying different relative weight
values.
This may be useful for downweighting low-confidence tests, e.g., those in repeat regions.
In Simes' procedure, weights are interpreted as relative frequencies of the tests in each cluster.
Note that these weights have no effect between clusters and will not be used to adjust the computed FDR.
By default, the relevant fields in tab
are identified by matching the column names to their expected values.
Multiple fields in tab
containing the logFC
substring are allowed, e.g., to accommodate ANOVA-like contrasts.
If the column names are different from what is expected, specification of the correct columns can be performed using pval.col
and fc.col
.
This will overwrite any internal selection of the appropriate fields.
This function will report the number of windows with log-fold changes above 0.5 and below -0.5, to give some indication of whether binding increases or decreases in the cluster.
If a cluster contains non-negligble numbers of up
and down
windows, this indicates that there may be a complex DB event within that cluster.
Similarly, complex DB may be present if the total number of windows is larger than the number of windows in either category (i.e., change is not consistent across the cluster).
Note that the threshold of 0.5 is arbitrary and has no impact on the significance calculations.
A simple clustering approach for windows is provided in mergeWindows
.
However, anything can be used so long as it does not compromise type I error control, e.g., promoters, gene bodies, independently called peaks.
Any tests with NA
values for ids
will be ignored.