contingency2xt: Kruskal-Wallis Test for the 2 x t Contingency Table

Description

This function uses the Kruskal-Wallis criterion to test the hypothesis of no association between the counts for two responses "A" and "B" across t categories.

Usage

contingency2xt(Avec, Bvec, 
	method = c("asymptotic", "simulated", "exact"), 
	dist = FALSE, tab0 = TRUE, Nsim = 1e+06)

Value

A list of class kSamples with components

test.name

"2 x t Contingency Table"

t

number of classification categories

KW.cont

2 (3) vector giving the observed KW statistic, its asymptotic $P$-value (and simulated or exact $P$-value)

null.dist

simulated or enumerated null distribution of the test statistic. It is given as an M by 2 matrix, where the first column (named KW) gives the M unique ordered values of the Kruskal-Wallis statistic and the second column (named prob) gives the corresponding (simulated or exact) probabilities.

This format of null.dist is returned when method = "exact" and dist = TRUE or when method = "simulated" and dist = TRUE and tab0 = TRUE are specified.

For method = "simulated", dist = TRUE, and tab0 = FALSE the null distribution null.dist is returned as the vector of all simulated test statistic values. This is used in contingency2xt.comb in the simulation mode.

null.dist = NULL is returned when dist = FALSE or when method = "asymptotic".

method

the method used.

Nsim

the number of simulations.

Arguments

Avec

vector of length $t$ giving the counts $A_1,\ldots, A_t$ for response "A" according to $t$ categories. $m = A_1 + \ldots + A_t$.

Bvec

vector of length $t$ giving the counts $B_1,\ldots, B_t$ for response "B" according to $t$ categories. $n = B_1 + \ldots + B_t = N-m$.

method

= c("asymptotic","simulated","exact"), where

"asymptotic" uses only an asymptotic chi-square approximation with $t-1$ degrees of freedom to approximate the $P$-value. This calculation is always done.

"simulated" uses Nsim simulated counts for Avec and Bvec with the observed marginal totals, m, n, d = Avec+Bvec, to estimate the $P$-value.

"exact" enumerates all counts for Avec and Bvec with the observed marginal totals to get an exact $P$-value. It is used only when Nsim is at least as large as the number choose(m+t-1,t-1) of full enumerations. Otherwise, method reverts to "simulated" using the given Nsim.

dist

FALSE (default) or TRUE. If dist = TRUE, the distribution of the simulated or fully enumerated Kruskal-Wallis test statistics is returned as null.dist, if dist = FALSE the value of null.dist is NULL. The coice dist = TRUE also limits Nsim <- min(Nsim,1e8).

tab0

TRUE (default) or FALSE. If tab0 = TRUE, the null distribution is returned in 2 column matrix form when method = "simulated". When tab0 = FALSE the simulated null distribution is returned as a vector of all simulated values of the test statistic.

Nsim

=10000 (default), number of simulated Avec splits to use. It is only used when method = "simulated", or when method = "exact" reverts to method = "simulated", as previously explained.

warning

method = "exact" should only be used with caution. Computation time is proportional to the number of enumerations. In most cases dist = TRUE should not be used, i.e., when the returned distribution objects become too large for R's work space.

Details

For this data scenario the Kruskal-Wallis criterion is $$K.star = \frac{N(N-1)}{mn}(\sum\frac{A_i^2}{d_i}-\frac{m^2}{N})$$ with $d_i=A_i+B_i$, treating "A" responses as 1 and "B" responses as 2, and using midranks as explained in Lehmann (2006), Chapter 5.3.

For small sample sizes exact null distribution calculations are possible, based on Algorithm C (Chase's sequence) in Knuth (2011), which allows the enumeration of all possible splits of $m$ into counts $A_1,\ldots, A_t$ such that $m = A_1 + \ldots + A_t$, followed by the calculation of the statistic $K.star$ for each such split. Simulation of $A_1,\ldots, A_t$ uses the probability model (5.35) in Lehmann (2006) to successively generate hypergeometric counts $A_1,\ldots, A_t$. Both these processes, enumeration and simulation, are done in C.

References

Knuth, D.E. (2011), The Art of Computer Programming, Volume 4A Combinatorial Algorithms Part 1, Addison-Wesley

Kruskal, W.H. (1952), A Nonparametric Test for the Several Sample Problem, The Annals of Mathematical Statistics, Vol 23, No. 4, 525-540

Kruskal, W.H. and Wallis, W.A. (1952), Use of Ranks in One-Criterion Variance Analysis, Journal of the American Statistical Association, Vol 47, No. 260, 583--621.

Lehmann, E.L. (2006), Nonparametrics, Statistical Methods Based on Ranks, Revised First Edition, Springer, New York.

Examples

Run this code

contingency2xt(c(25,15,20),c(16,6,18),method="exact",dist=FALSE,
	tab0=TRUE,Nsim=1e3)

Run the code above in your browser using DataLab