exact.test: Unconditional exact tests for 2x2 tables

Description

Calculates Barnard's or Boschloo's unconditional exact test for binomial or multinomial models

Usage

exact.test(data, alternative = "two.sided", npNumbers = 100, beta = 0.001,
           interval = FALSE, method = "Z-pooled", model = "Binomial", 
           cond.row = TRUE, to.plot = TRUE, ref.pvalue=TRUE)

Arguments

data

A two dimensional contingency table in matrix form

alternative

Indicates the alternative hypothesis: must be either "less", "two.sided", or "greater"

npNumbers

Number: The number of nuisance parameters considered

beta

Number: Confidence level for constructing the interval of nuisance parameters considered. Only used if interval=TRUE

interval

Logical: Indicates if a confidence interval on the nuisance parameter should be computed

method

Indicates the method for finding tables as or more extreme than the observed table: must be either "Z-pooled", "Z-unpooled", "Santner and Snell", "Boschloo", "CSM", "CSM modified", or "CSM approximate". CSM tests cannot be calculated for multinomial mod

model

The model being used: must be either "Binomial" or "Multinomial"

cond.row

Logical: Indicates if row margins are fixed in the binomial models. Only used if model="Binomial"

to.plot

Logical: Indicates if plot of p-value vs. nuisance parameter should be generated. Only used if model="Binomial"

ref.pvalue

Logical: Indicates if p-value should be refined by maximizing the p-value function after the nuisance parameter is selected. Only used if model="Binomial"

Value

p.valueThe computed p-value
test.statisticThe observed test statistic
npThe nuisance parameter that maximizes the p-value. For multinomial models, both nuisance parameters are given
np.rangeThe range of nuisance parameters considered. For multinomial models, both nuisance parameter ranges are given

Warning

Multinomial models and CSM tests may take a very long time, even for sample sizes less than 100.

Details

Unconditional exact tests can be used for binomial or multinomial models. The binomial model assumes the row or column margins (but not both) are known in advance, while the multinomial model assumes only the total sample size is known beforehand. Conditional tests have both row and column margins fixed. The null hypothesis is that the rows and columns are independent. Under the binomial model, the user will need to input which margin is fixed (default is rows). $\vspace{3 mm}$ Let $X$ denote a generic 2x2 table with fixed sample sizes $n_1$ and $n_2$, $X_0$ denote the observed table, and $T(X)$ represent the test statistic function. The null hypothesis can be written as $p_1=p_2 \equiv p$. The p-value function with rows fixed is the product of two independent binomials: $$P(X|p)= \sup_{0 \leq p \leq 1} \sum_{T(X) \geq T(X_0)} {n_1 \choose x_{11}} {n_2 \choose x_{21}} p^{x_{11}+x_{21}} (1-p)^{x_{12}+x_{13}}$$ The multinomial model is similar except the summand has a multinomial distribution with two nuisance parameters. $\vspace{3 mm}$ There are several possible test statistics to determine the 'as or more extreme' tables seen in the index of summation. The method variable lets the user choose the test statistic being used. A brief description for each test statistic is given below (see References for more details): $\vspace{3 mm}$ Let $\hat{p_1}=x_{11}/n_1$, $\hat{p_2}=x_{21}/n_2$, and $\hat{p}=(x_{11}+x_{21})/(n_1+n_2)$. $\vspace{3 mm}$ Z-unpooled (or Wald): $$Z_u(x_{11},x_{21})=\frac{\hat{p_2}-\hat{p_1}}{\sqrt{\frac{\hat{p_1}(1-\hat{p_1})}{n_1}+\frac{\hat{p_2}(1-\hat{p_2})}{n_2}}}$$ Z-pooled (or Score): $$Z_p(x_{11},x_{21})=\frac{\hat{p_2}-\hat{p_1}}{\sqrt{\frac{\hat{p}(1-\hat{p})}{n_1}+\frac{\hat{p}(1-\hat{p})}{n_2}}}$$ Santner and Snell: $$D(x_{11},x_{21})=\hat{p_2}-\hat{p_1}$$ Boschloo: Uses the p-value from Fisher's exact test as the test statistic. $\vspace{3 mm}$ CSM: Starts with the most extreme table and adds other 'as or more extreme' tables one step at a time by maximizing the summand of the p-value function. This approach can be computationally intensive. $\vspace{0 mm}$ CSM modified: Starts with all tables that must be more extreme and adds other 'as or more extreme' tables one step at a time by maximizing the summand of the p-value function. This approach can be computationally intensive. $\vspace{3 mm}$ CSM approximate: Maximizes the summand of the p-value function for each possible table. Thus, the test statistic is the p-value function without the summation. This approach is less computationally intensive than the CSM test because the maximization is not repeated at each step. $\vspace{3 mm}$ The supremum of the common success probability is taken over all values between 0 and 1. Another approach, proposed by Berger and Boos, is to take the supremum over a Clopper-Pearson confidence interval. This approach adds a small penalty to the p-value to ensure a level-$\alpha$ test, but eliminates unlikely probabilities from inflating the p-value. The p-value function becomes: $$P(X|p)= \left(\sup_{p \in C_\beta} \sum_{T(X) \geq T(X_0)} {n_1 \choose x_{11}} {n_2 \choose x_{21}} p^{x_{11}+x_{21}} (1-p)^{x_{12}+x_{13}}\right) + \beta$$ where $C_\beta$ is the $100(1-\beta)%$ confidence interval of $p$ $\vspace{3 mm}$ There are many ways to define the two-sided p-value; this code uses the fisher.test() approach by summing the probabilities for both sides of the table.

References

This code was influenced by the FORTRAN program located at http://www4.stat.ncsu.edu/~boos/exact/

Examples

Run this code

data<-matrix(c(7,8,12,3),2,2,byrow=TRUE)
exact.test(data,alternative="less",to.plot=TRUE)
exact.test(data,alternative="two.sided",interval=TRUE,beta=0.001,npNumbers=100,method="Z-pooled",
           to.plot=FALSE)
exact.test(data,alternative="two.sided",interval=TRUE,beta=0.001,npNumbers=100,method="Boschloo",
           to.plot=FALSE)

#Example from Barnard's (1947) appendix:
data<-matrix(c(4,0,3,7),2,2,dimnames=list(c("Box 1","Box 2"),c("Defective","Not Defective")))
exact.test(data,method="CSM",alternative="two.sided")

data<-matrix(c(6,8,4,3),2,2,byrow=TRUE)
exact.test(data,model="Multinomial",alternative="less",method="Z-pooled")

Run the code above in your browser using DataLab