getBestTest: Get the best test in a cluster

Description

Find the test with the strongest evidence for rejection of the null in each cluster.

Usage

getBestTest(ids, tab, by.pval=TRUE, weight=NULL, pval.col=NULL, cpm.col=NULL)

Arguments

ids

an integer vector containing the cluster ID for each test

tab

a table of results with a PValue field for each test

by.pval

a logical scalar, indicating whether selection should be performed on corrected p-values

weight

a numeric vector of weights for each window, defaults to 1 for each test

pval.col

an integer scalar specifying the column of tab containing the p-values, or a character string containing the name of that column

cpm.col

an integer scalar specifying the column of tab containing the log-CPM values, or a character string containing the name of that column

Value

A dataframe with one row per cluster and the numeric fields best, the index for the best test in the cluster; PValue, the (possibly adjusted) p-value for that test; and FDR, the q-value corresponding to the adjusted p-value. Note that the p-value column may be named differently if pval.col is specified. Other fields in tab corresponding to the best test inthe cluster are also returned. Cluster IDs are stored as the row names.

Details

Clusters are identified as those tests with the same value of ids (any NA values are ignored). If by.pval=TRUE, this function identifies the test with the lowest p-value as that with the strongest evidence against the null in each cluster. The p-value of the chosen test is adjusted using the Bonferroni correction, based on the total number of tests in the parent cluster. This is necessary to obtain strong control of the family-wise error rate such that the best test can be taken from each cluster for further consideration.

The importance of each window in each cluster can be adjusted by supplying different relative weight values. Each weight is interpreted as a different threshold for each test in the cluster. Larger weights correspond to lower thresholds, i.e., less evidence is needed to reject the null for tests deemed to be more important. This may be useful for upweighting particular tests, e.g., windows containing a motif for the TF of interest.

Note the difference between this function and combineTests. The latter presents evidence for any rejections within a cluster. This function specifies the exact location of the rejection in the cluster, which may be more useful in some cases but at the cost of conservativeness. In both cases, clustering procedures such as mergeWindows can be used to identify the cluster.

If by.pval=FALSE, the best test is defined as that with the highest log-CPM value. This should be independent of the p-value so no adjustment is necessary. Weights are not applied here. This mode may be useful when abundance is correlated to rejection under the alternative hypothesis, e.g., picking high-abundance regions that are more likely to contain peaks.

By default, the relevant fields in tab are identified by matching the column names to their expected values. If the column names are different from what is expected, specification of the correct column can be performed using pval.col and cpm.col.

References

Wasserman, L, and Roeder, K (2006). Weighted hypothesis testing. arXiv preprint math/0604172.

Examples

Run this code

ids <- round(runif(100, 1, 10))
tab <- data.frame(logFC=rnorm(100), logCPM=rnorm(100), PValue=rbeta(100, 1, 2))
best <- getBestTest(ids, tab)
head(best)

best <- getBestTest(ids, tab, cpm.col="logCPM", pval.col="PValue")
head(best)

# With window weighting.
w <- round(runif(100, 1, 5))
best <- getBestTest(ids, tab, weight=w)
head(best)

# By logCPM.
best <- getBestTest(ids, tab, by.pval=FALSE)
head(best)

best <- getBestTest(ids, tab, by.pval=FALSE, cpm.col=2, pval.col=3)
head(best)

Run the code above in your browser using DataLab