This function uses the Kruskal-Wallis criterion to test the hypothesis of no association between the counts for two responses "A" and "B" across t categories.
contingency2xt(Avec, Bvec,
method = c("asymptotic", "simulated", "exact"),
dist = FALSE, tab0 = TRUE, Nsim = 1e+06)
A list of class kSamples
with components
"2 x t Contingency Table"
number of classification categories
2 (3) vector giving the observed KW statistic, its asymptotic \(P\)-value (and simulated or exact \(P\)-value)
simulated or enumerated null distribution
of the test statistic. It is given as an M
by 2 matrix,
where the first column (named KW
) gives the M
unique ordered
values of the Kruskal-Wallis
statistic and the second column (named prob
) gives the corresponding (simulated or exact)
probabilities.
This format of null.dist
is returned when method = "exact"
and dist
= TRUE
or when method =
"simulated"
and dist = TRUE
and tab0
= TRUE
are specified.
For method =
"simulated"
, dist = TRUE
, and
tab0 = FALSE
the null distribution null.dist
is returned as the vector of
all simulated test statistic values. This is used in contingency2xt.comb
in the simulation mode.
null.dist = NULL
is returned
when dist = FALSE
or when method =
"asymptotic"
.
the method
used.
the number of simulations.
vector of length \(t\) giving the counts \(A_1,\ldots, A_t\) for response "A" according to \(t\) categories. \(m = A_1 + \ldots + A_t\).
vector of length \(t\) giving the counts \(B_1,\ldots, B_t\) for response "B" according to \(t\) categories. \(n = B_1 + \ldots + B_t = N-m\).
= c("asymptotic","simulated","exact")
, where
"asymptotic"
uses only an asymptotic chi-square approximation
with \(t-1\) degrees of freedom to approximate the \(P\)-value.
This calculation is always done.
"simulated"
uses Nsim
simulated counts for Avec
and
Bvec
with the observed marginal totals, m, n, d = Avec+Bvec
,
to estimate the \(P\)-value.
"exact"
enumerates all counts for Avec
and Bvec
with
the observed marginal totals to get an exact \(P\)-value. It is used only
when Nsim
is at least as large as the number choose(m+t-1,t-1)
of full enumerations.
Otherwise, method
reverts to "simulated"
using the given Nsim
.
FALSE
(default) or TRUE
. If dist = TRUE
, the distribution of the
simulated or fully enumerated Kruskal-Wallis test statistics is
returned as null.dist
, if dist = FALSE
the value
of null.dist
is NULL
.
The coice dist = TRUE
also limits Nsim <- min(Nsim,1e8)
.
TRUE
(default) or FALSE
. If tab0 = TRUE
, the null distribution
is returned in 2 column matrix form when
method = "simulated"
. When tab0 = FALSE
the simulated null distribution
is returned as a vector of all simulated values of the test statistic.
=10000
(default), number of simulated Avec
splits to use.
It is only used when method = "simulated"
,
or when method = "exact"
reverts to method =
"simulated"
, as previously explained.
method = "exact"
should only be used with caution.
Computation time is proportional to the number of enumerations. In most cases
dist = TRUE
should not be used, i.e.,
when the returned distribution objects
become too large for R's work space.
For this data scenario the Kruskal-Wallis criterion is $$K.star = \frac{N(N-1)}{mn}(\sum\frac{A_i^2}{d_i}-\frac{m^2}{N})$$ with \(d_i=A_i+B_i\), treating "A" responses as 1 and "B" responses as 2, and using midranks as explained in Lehmann (2006), Chapter 5.3.
For small sample sizes exact null distribution calculations are possible, based on Algorithm C (Chase's sequence) in Knuth (2011), which allows the enumeration of all possible splits of \(m\) into counts \(A_1,\ldots, A_t\) such that \(m = A_1 + \ldots + A_t\), followed by the calculation of the statistic \(K.star\) for each such split. Simulation of \(A_1,\ldots, A_t\) uses the probability model (5.35) in Lehmann (2006) to successively generate hypergeometric counts \(A_1,\ldots, A_t\). Both these processes, enumeration and simulation, are done in C.
Knuth, D.E. (2011), The Art of Computer Programming, Volume 4A Combinatorial Algorithms Part 1, Addison-Wesley
Kruskal, W.H. (1952), A Nonparametric Test for the Several Sample Problem, The Annals of Mathematical Statistics, Vol 23, No. 4, 525-540
Kruskal, W.H. and Wallis, W.A. (1952), Use of Ranks in One-Criterion Variance Analysis, Journal of the American Statistical Association, Vol 47, No. 260, 583--621.
Lehmann, E.L. (2006), Nonparametrics, Statistical Methods Based on Ranks, Revised First Edition, Springer, New York.
contingency2xt(c(25,15,20),c(16,6,18),method="exact",dist=FALSE,
tab0=TRUE,Nsim=1e3)
Run the code above in your browser using DataLab