fisher.pvalues.support: Computing discrete p-values and their support for binomial and Fisher's exact tests

Description

Computes discrete raw p-values and their support for binomial test or Fisher's exact test applied to 2 x 2 contingency tables summarizing counts coming from two categorical measurements.

Usage

fisher.pvalues.support(counts, alternative = "greater", input = "noassoc")

Arguments

counts

a data frame of 2 or 4 columns and any number of lines, each line representing a 2 x 2 contingency table to test. The number of columns and what they must contain depend on the value of the input argument, see Details.

alternative

same argument as in fisher.test. The three possible values are "greater" (default), "two.sided" or "less" and you can specify just the initial letter.

input

the format of the input data frame, see Details. The three possible values are "noassoc" (default), "marginal" or "HG2011" and you can specify just the initial letter.

Value

A list of two elements:

raw

raw discrete p-values.

support

a list of the supports of the CDFs of the p-values. Each support is represented by a vector in increasing order.

Details

Assume that each contingency tables compares 2 variables and resumes the counts of association or not with a condition. This can be resumed in the following table:

	Association	No association	Total
Variable 1	X1	Y1	N1
Variable 2	X2	Y2	N2

If input="noassoc", counts has 4 columns which respectively contain X1, Y1, X2 and Y2. If input="marginal", counts has 4 columns which respectively contain X1, N1, X2 and N2.

If input="HG2011", we are in the situation of the amnesia data set as in Heller & Gur (2011, see References). Each contingency table is obtained from one variable which is compared to all other variables of the study. That is, counts for "second variable" are replaced by the sum of the counts of the other variables:

	Association	No association	Total
Variable j	Xj	Yj	Nj
Variables !=j	SUM(Xi) - Xj	SUM(Yi) - Yj	SUM(Ni) - Nj

Hence counts needs to have only 2 columns which respectively contain Xj and Yj.

binomial.pvalues.support and fisher.pvalues.support are wrapper functions for pvalues.support, setting test.type = "binomial" and test.type = "fisher", respectively.

The code for the computation of the p-values of Fisher's exact test is inspired by the example in the help page of p.discrete.adjust.

See the Wikipedia article about Fisher's exact test, paragraph Example, for a good depiction of what the code does for each possible value of alternative.

The binomial test simply tests for p = 0.5 by using X1 as the test statistic and N1 as the number of trials.

This version: 2019-11-15.

References

R. Heller and H. Gur (2011). False discovery rate controlling procedures for discrete tests. arXiv preprint arXiv:1112.4627v2 link.

"Fisher's exact test", Wikipedia, The Free Encyclopedia, accessed 2018-03-20, link.

Examples

Run this code

# NOT RUN {
X1 <- c(4, 2, 2, 14, 6, 9, 4, 0, 1)
X2 <- c(0, 0, 1, 3, 2, 1, 2, 2, 2)
N1 <- rep(148, 9)
N2 <- rep(132, 9)
Y1 <- N1 - X1
Y2 <- N2 - X2
df <- data.frame(X1, Y1, X2, Y2)
df

#Construction of the p-values and their support
df.formatted <- fisher.pvalues.support(counts = df, input = "noassoc")
raw.pvalues <- df.formatted$raw
pCDFlist <- df.formatted$support

data(amnesia)
#We only keep the first 100 lines to keep the computations fast.
#We also drop the first column to keep only columns of counts, in the Heller & Gur (2011) setting.
amnesia <- amnesia[1:100,2:3]

#Construction of the p-values and their support
amnesia.formatted <- fisher.pvalues.support(counts = amnesia, input = "HG2011")
raw.pvalues <- amnesia.formatted$raw
pCDFlist <- amnesia.formatted$support
# }

Run the code above in your browser using DataLab