mann_whitney_test_pv: Wilcoxon-Mann-Whitney U test

Description

mann_whitney_test_pv() performs an exact or approximate Wilcoxon-Mann-Whitney U test about the location shift between two independent groups when the data is not necessarily normally distributed. In contrast to stats::wilcox.test(), it is vectorised and only calculates p-values. Furthermore, it is capable of returning the discrete p-value supports, i.e. all observable p-values under a null hypothesis. Multiple tests can be evaluated simultaneously.

Usage

mann_whitney_test_pv(
  x,
  y,
  mu = 0,
  alternative = "two.sided",
  exact = NULL,
  correct = TRUE,
  digits_rank = Inf,
  simple_output = FALSE
)

Value

If simple.output = TRUE, a vector of computed p-values is returned. Otherwise, the output is a DiscreteTestResults R6 class object, which also includes the p-value supports and testing parameters. These have to be accessed by public methods, e.g. $get_pvalues().

Arguments

x, y: numerical vectors forming the samples to be tested or lists of numerical vectors for multiple tests.
mu: numerical vector of hypothesised location shift(s).
alternative: character vector that indicates the alternative hypotheses; each value must be one of "two.sided" (the default), "less" or "greater".
exact: logical value that indicates whether p-values are to be calculated by exact computation (TRUE; the default) or by a continuous approximation (FALSE).
correct: logical value that indicates if a continuity correction is to be applied (TRUE; the default) or not (FALSE). Ignored, if exact = TRUE.
digits_rank,: single number giving the significant digits used to compute ranks for the test statistics.
simple_output,: logical value that indicates whether an R6 class object, including the tests' parameters and support sets, i.e. all observable p-values under each null hypothesis, is to be returned (see below).

Details

We use a test statistic called the Wilcoxon Rank Sum Statistic, defined by $$U = \sum_{i = 1}^{n_X}{rank(X_i)} - \frac{n_X(n_X + 1)}{2},$$ where $rank(X_i)$ is the rank of $X_i$ in the concatenated sample of $X$ and $Y$, and $n_X$ and $n_Y$ are the respective sizes of the samples $X$ and $Y$. Note that $U$ can range from $0$ to $n_X \cdot n_Y$. This is the same statistic used by stats::wilcox.test() and whose distribution is accessible with pwilcox. This is also the statistic defined by the two given references. Note, however, that it is not what is called the Mann-Whitney U Statistic in the (English-language) Wikipedia article (as of February 12, 2026). The latter is defined as, using our notation, $\min(U, n_X \cdot n_Y - U)$. Using the Wikipedia notation, the Wilcoxon Rank Sum Statistic is $U_2$.

The parameters x, y, mu and alternative are vectorised. If x and y are lists, they are replicated automatically to have the same lengths. In case x or y are not lists, they are added to new ones, which are then replicated to the appropriate lengths. This allows multiple hypotheses to be tested simultaneously.

In the presence of ties, computation of exact p-values is not possible. Therefore, exact is ignored in these cases and p-values of the respective test settings are calculated by a normal approximation.

By setting exact = NULL, exact computation is performed if both samples in a test setting do not have any ties and if both sample sizes are lower than or equal to 200.

If digits_rank = Inf (the default), rank() is used to compute ranks for the tests statistics instead of rank(signif(., digits_rank))

References

Mann, H. D. & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Ann. Math. Statist., 18(1), pp. 50-60. tools:::Rd_expr_doi("10.1214/aoms/1177730491")

Hollander, M. & Wolfe, D. (1973). Nonparametric Statistical Methods. Third Edition. New York: Wiley. pp. 115-135. tools:::Rd_expr_doi("10.1002/9781119196037")

Examples

Run this code

# Constructing
set.seed(1)
r1 <- rnorm(100)
r2 <- rnorm(100, 1)

# Exact two-sided p-values and their supports
results_ex  <- mann_whitney_test_pv(r1, r2)
print(results_ex)
results_ex$get_pvalues()
results_ex$get_pvalue_supports()

# Normal-approximated one-sided p-values ("less") and their supports
results_ap  <- mann_whitney_test_pv(r1, r2, alternative = "less", exact = FALSE)
print(results_ap)
results_ap$get_pvalues()
results_ap$get_pvalue_supports()

Run the code above in your browser using DataLab