Distribution of the Wilcoxon Rank Sum Statistic
Density, distribution function, quantile function and random
generation for the distribution of the Wilcoxon rank sum statistic
obtained from samples with size
dwilcox(x, m, n, log = FALSE) pwilcox(q, m, n, lower.tail = TRUE, log.p = FALSE) qwilcox(p, m, n, lower.tail = TRUE, log.p = FALSE) rwilcox(nn, m, n)
- x, q
- vector of quantiles.
- vector of probabilities.
- number of observations. If
length(nn) > 1, the length is taken to be the number required.
- m, n
- numbers of observations in the first and second sample, respectively. Can be vectors of positive integers.
- log, log.p
- logical; if TRUE, probabilities p are given as log(p).
- logical; if TRUE (default), probabilities are $P[X \le x]$, otherwise, $P[X > x]$.
This distribution is obtained as follows. Let
be two random, independent samples of size
Then the Wilcoxon rank sum statistic is the number of all pairs
(x[i], y[j]) for which
y[j] is not greater than
x[i]. This statistic takes values between
m * n, and its mean and variance are
m * n / 2 and
m * n * (m + n + 1) / 12, respectively.
If any of the first three arguments are vectors, the recycling rule is used to do the calculations for all combinations of the three up to the length of the longest vector.
dwilcoxgives the density,
pwilcoxgives the distribution function,
qwilcoxgives the quantile function, and
rwilcoxgenerates random deviates.
The length of the result is determined by
rwilcox, and is the maximum of the lengths of the numerical arguments for the other functions. The numerical arguments other than
nnare recycled to the length of the result. Only the first elements of the logical arguments are used.
S-PLUS uses a different (but equivalent) definition of the Wilcoxon
wilcox.test for details.
These functions can use large amounts of memory and stack (and even crash Rif the stack limit is exceeded and stack-checking is not in place) if one sample is large (several thousands or more).
These are calculated via recursion, based on
cwilcox(k, m, n),
the number of choices with statistic
k from samples of size
n, which is itself calculated recursively and the
results cached. Then
appropriate values of
qwilcox is based on
rwilcox generates a random permutation of ranks and evaluates
wilcox.test to calculate the statistic from data, find p
values and so on.
require(graphics) x <- -1:(4*6 + 1) fx <- dwilcox(x, 4, 6) Fx <- pwilcox(x, 4, 6) layout(rbind(1,2), widths = 1, heights = c(3,2)) plot(x, fx, type = "h", col = "violet", main = "Probabilities (density) of Wilcoxon-Statist.(n=6, m=4)") plot(x, Fx, type = "s", col = "blue", main = "Distribution of Wilcoxon-Statist.(n=6, m=4)") abline(h = 0:1, col = "gray20", lty = 2) layout(1) # set back N <- 200 hist(U <- rwilcox(N, m = 4,n = 6), breaks = 0:25 - 1/2, border = "red", col = "pink", sub = paste("N =",N)) mtext("N * f(x), f() = true "density"", side = 3, col = "blue") lines(x, N*fx, type = "h", col = "blue", lwd = 2) points(x, N*fx, cex = 2) ## Better is a Quantile-Quantile Plot qqplot(U, qw <- qwilcox((1:N - 1/2)/N, m = 4, n = 6), main = paste("Q-Q-Plot of empirical and theoretical quantiles", "Wilcoxon Statistic, (m=4, n=6)", sep = "")) n <- as.numeric(names(print(tU <- table(U)))) text(n+.2, n+.5, labels = tU, col = "red")