wmw_test: Wilcoxon-Mann-Whitney Test of No Group Discrimination

Description

Performs distribution-free Wilcoxon-Mann-Whitney test for AUC-detectable group discrimination, testing \(\mathrm{H_0\colon AUC} = 0.5\) against \(\mathrm{H_1\colon AUC} \neq 0.5\). Under location-shift assumption, equivalently tests zero location difference.

Usage

wmw_test(
  formula,
  data,
  ref_level = NULL,
  special_case = FALSE,
  alternative = c("two.sided", "greater", "less"),
  pvalue_method = "EU",
  ci_method = "hanley",
  conf_level = 0.95,
  n_grid = 100,
  ...
)

Value

Object of class 'wmw_test' containing:

special_case: Logical indicating whether special case (location-shift) analysis was performed
n: Named vector with components n1, n2 giving sample sizes for each group
U_statistic: U statistic
p_value: P-value for testing H0: AUC = 0.5
alternative: Alternative hypothesis specification
pvalue_method: Character string describing the test method
data_name: Character string giving the name of the data
pseudomedian: Hodges-Lehmann median difference estimate (when special_case = TRUE)
pseudomedian_conf_int: Confidence interval for the location shift (when special_case = TRUE)
pseudomedian_conf_level: Confidence level for the confidence interval for HL estimator (when special_case = TRUE)
ci_method: Method used to compute confidence interval for AUC
roc_object: ROC analysis object returned by roc_with_ci function
auc: Empirical AUC (eAUC), the standardized U statistic
auc_conf_int: Confidence interval for true AUC using Hanley-McNeil or bootstrap method
x_vals: Numeric vector of observations from non-reference group
y_vals: Numeric vector of observations from reference group
groups: Character vector of group labels from original data
group_levels: Character vector of factor levels for grouping variable
group_ref_level: Character string indicating which level corresponds to reference group

Arguments

formula: Formula of the form response ~ group
data: Data frame containing continuous response variable and grouping factor
ref_level: Character, reference level of grouping factor (if NULL, uses first level)
special_case: Logical, location-shift assumption (default FALSE)
alternative: Character, alternative hypothesis is c("two.sided", "greater", "less")
pvalue_method: Character, method ('EU', 'BC') used for computing p-values; 'BC' assumes continuous data (default 'EU')
ci_method: Character, confidence interval method for eAUC: c('hanley', 'boot', 'none')
conf_level: Numeric, confidence level for intervals (default 0.95)
n_grid: Numeric, number of grid points for search in pseudomedian_ci() (default 100)
...: Additional arguments passed to roc_with_ci()

Details

The function tests the null hypothesis \(\mathrm{H_0\colon AUC} = 0.5\) against \(\mathrm{H_0\colon AUC} \neq 0.5\), where AUC represents the Area Under the ROC Curve and - following the convention of wilcox.test() - equals the probability \(P(X > Y)\) that a randomly selected observation from the first group exceeds a randomly selected observation from the second group.

For response ~ group, observations from the non-reference group constitute \(X\), while observations from the reference group (specified by ref_level) constitute \(Y\). Thus AUC = P(non-reference > reference). If ref_level is not specified, the first factor level is used as reference. The U statistic and the resulting empirical AUC (eAUC) are calculated consistently with this group assignment.

The test statistic is eAUC, which estimates the true AUC. The empirical ROC curve (eROC) is constructed by varying the classification threshold across all observed values and computing sensitivity and 1-specificity at each threshold.

When special_case = TRUE, the function additionally reports location-shift parameters under the assumption that \(F_1(x) = F_2(x - \delta)\). Under this assumption, the discrimination test \(\mathrm{H_0\colon AUC} = 0.5\) is mathematically equivalent to testing H0: \(\delta = 0\) (zero location shift). In this special case, eAUC takes the dual role of both test statistic and effect size for the location difference.

Confidence intervals for the true AUC are computed using either the Hanley and McNeil (1982) method based on asymptotic normality, or bootstrap resampling. If bootstrap resampling is selected, it is also used for constructing the confidence band for the ROC curve.

The function uses Exact Unbiased ('EU') method for computing p-values that can handle any type of data (continuous, discrete, mixed). The Bias-Corrected ('BC') method that requires continuous data is provided through pvalue_method = 'BC' option.

Constructs confidence intervals for the pseudomedian via test inversion. Under location-shift assumptions (\(G(x) = F(x - \delta)\)), the pseudomedian represents the location difference between groups.

Statistical Methodology: Unlike standard implementations that assume the erroneously broad null hypothesis \(\mathrm{H_0\colon F = G}\), this function derives p-values under the correct null hypothesis \(\mathrm{H_0\colon AUC} = 0.5\) that WMW actually tests. P-values are computed using asymptotic distribution theory with two methods of finite-sample bias corrections:

Exact Unbiased ('EU') estimation of variance of eAUC which handles any type of data (continuous, discrete, mixed);
Bias Correction ('BC') sample-size dependent method to maintain proper Type I error control. Confidence intervals for the pseudomedian are obtained by inverting the test.

References

Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80-83.

Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 18(1), 50-60.

Van Dantzig, D. (1951). On the consistency and the power of Wilcoxon's two sample test. Proceedings KNAW, Series A, 54(1), 1-8.

Birnbaum, Z. W. (1956). On a use of the Mann-Whitney statistic. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics (Vol. 3, pp. 13-18). University of California Press.

Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of mathematical psychology, 12(4), 387-415.

Lehmann, E. L., & Abrera, H. B. D. (1975). Nonparametrics. Statistical methods based on ranks. San Francisco, CA, Holden-Day.

Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29-36.

Cliff, N. (1993). Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological bulletin, 114(3), 494.

Arcones, M. A., Kvam, P. H., & Samaniego, F. J. (2002). Nonparametric estimation of a distribution subject to a stochastic precedence constraint. Journal of the American Statistical Association, 97(457), 170-182.

Pepe, M. S. (2003). The statistical evaluation of medical tests for classification and prediction. Oxford university press.

Conroy, R. M. (2012). What hypotheses do “nonparametric” two-group tests actually test?. The Stata Journal, 12(2), 182-190.

del Barrio, E., Cuesta-Albertos, J. A., & Matrán, C. (2025). Invariant measures of disagreement with stochastic dominance. The American Statistician, 1-13.

Grendar, M. (2025). Wilcoxon-Mann-Whitney test of no group discrimination. arXiv:2511.20308.

Examples

Run this code

library('wmwAUC')  
# Ex 1
# \donttest{
library('gemR')
data(MS)
da <- MS
# preparing data frame
class(da$proteins) <- setdiff(class(da$proteins), "AsIs")
df <- as.data.frame(da$proteins)
df$MS <- da$MS
# WMW test 
wmd <- wmw_test(P19099 ~ MS, data = df, ref_level = 'no')
wmd
plot(wmd)
# EDA to assess location shift assumption validity
qp <- quadruplot(P19099 ~ MS, data = df, ref_level = 'no')
qp
# => location shift assumption is not valid
# }

# Ex 2
# \donttest{
data(Ex2)
da <- Ex2
# WMW test
wmd <- wmw_test(y ~ group, data = da, ref_level = 'control')
wmd
plot(wmd)
# Check location-shift assumption with EDA
qp <- quadruplot(y ~ group, data = da, ref_level = 'control', test = 'ks')
qp
# => location-shift assumption not tenable
# Note that medians are essentially the same:
median(da$y[da$group == 'case'])
# 0.495
median(da$y[da$group == 'control'])
# 0.493
# Erroneous use of location-shift special case would falsely 
# conclude significant median difference despite identical medians
wml <- wmw_test(y ~ group, data = da, special_case = TRUE,
                ref_level = 'control')
wml
# }

# Ex 3
# \donttest{
library('gss')
data(wesdr)
da = wesdr
da$ret = as.factor(da$ret)
# WMW 
wmd <- wmw_test(bmi ~ ret, data = da, ref_level = '0')
wmd
plot(wmd)
# EDA to assess location shift assumption validity
qp <- quadruplot(bmi ~ ret, data = da, ref_level = '0')
qp
# => location shift assumption is tenable
# Special case of WMW test
wml <- wmw_test(bmi ~ ret, data = da, ref_level = '0', 
                ci_method = 'boot', special_case = TRUE)
wml
plot(wml)
# }

Run the code above in your browser using DataLab