Learn R Programming

wmwAUC (version 0.2.0)

wmw_test: Wilcoxon-Mann-Whitney Test of No Group Discrimination

Description

Performs distribution-free Wilcoxon-Mann-Whitney test for AUC-detectable group discrimination, testing \(\mathrm{H_0\colon AUC} = 0.5\) against \(\mathrm{H_1\colon AUC} \neq 0.5\). Under location-shift assumption, equivalently tests zero location difference.

Usage

wmw_test(
  formula,
  data,
  ref_level = NULL,
  special_case = FALSE,
  alternative = c("two.sided", "greater", "less"),
  pvalue_method = "EU",
  ci_method = "hanley",
  conf_level = 0.95,
  n_grid = 100,
  ...
)

Value

Object of class 'wmw_test' containing:

special_case

Logical indicating whether special case (location-shift) analysis was performed

n

Named vector with components n1, n2 giving sample sizes for each group

U_statistic

U statistic

p_value

P-value for testing H0: AUC = 0.5

alternative

Alternative hypothesis specification

pvalue_method

Character string describing the test method

data_name

Character string giving the name of the data

pseudomedian

Hodges-Lehmann median difference estimate (when special_case = TRUE)

pseudomedian_conf_int

Confidence interval for the location shift (when special_case = TRUE)

pseudomedian_conf_level

Confidence level for the confidence interval for HL estimator (when special_case = TRUE)

ci_method

Method used to compute confidence interval for AUC

roc_object

ROC analysis object returned by roc_with_ci function

auc

Empirical AUC (eAUC), the standardized U statistic

auc_conf_int

Confidence interval for true AUC using Hanley-McNeil or bootstrap method

x_vals

Numeric vector of observations from non-reference group

y_vals

Numeric vector of observations from reference group

groups

Character vector of group labels from original data

group_levels

Character vector of factor levels for grouping variable

group_ref_level

Character string indicating which level corresponds to reference group

Arguments

formula

Formula of the form response ~ group

data

Data frame containing continuous response variable and grouping factor

ref_level

Character, reference level of grouping factor (if NULL, uses first level)

special_case

Logical, location-shift assumption (default FALSE)

alternative

Character, alternative hypothesis is c("two.sided", "greater", "less")

pvalue_method

Character, method ('EU', 'BC') used for computing p-values; 'BC' assumes continuous data (default 'EU')

ci_method

Character, confidence interval method for eAUC: c('hanley', 'boot', 'none')

conf_level

Numeric, confidence level for intervals (default 0.95)

n_grid

Numeric, number of grid points for search in pseudomedian_ci() (default 100)

...

Additional arguments passed to roc_with_ci()

Details

The function tests the null hypothesis \(\mathrm{H_0\colon AUC} = 0.5\) against \(\mathrm{H_0\colon AUC} \neq 0.5\), where AUC represents the Area Under the ROC Curve and - following the convention of wilcox.test() - equals the probability \(P(X > Y)\) that a randomly selected observation from the first group exceeds a randomly selected observation from the second group.

For response ~ group, observations from the non-reference group constitute \(X\), while observations from the reference group (specified by ref_level) constitute \(Y\). Thus AUC = P(non-reference > reference). If ref_level is not specified, the first factor level is used as reference. The U statistic and the resulting empirical AUC (eAUC) are calculated consistently with this group assignment.

The test statistic is eAUC, which estimates the true AUC. The empirical ROC curve (eROC) is constructed by varying the classification threshold across all observed values and computing sensitivity and 1-specificity at each threshold.

When special_case = TRUE, the function additionally reports location-shift parameters under the assumption that \(F_1(x) = F_2(x - \delta)\). Under this assumption, the discrimination test \(\mathrm{H_0\colon AUC} = 0.5\) is mathematically equivalent to testing H0: \(\delta = 0\) (zero location shift). In this special case, eAUC takes the dual role of both test statistic and effect size for the location difference.

Confidence intervals for the true AUC are computed using either the Hanley and McNeil (1982) method based on asymptotic normality, or bootstrap resampling. If bootstrap resampling is selected, it is also used for constructing the confidence band for the ROC curve.

The function uses Exact Unbiased ('EU') method for computing p-values that can handle any type of data (continuous, discrete, mixed). The Bias-Corrected ('BC') method that requires continuous data is provided through pvalue_method = 'BC' option.

Constructs confidence intervals for the pseudomedian via test inversion. Under location-shift assumptions (\(G(x) = F(x - \delta)\)), the pseudomedian represents the location difference between groups.

Statistical Methodology: Unlike standard implementations that assume the erroneously broad null hypothesis \(\mathrm{H_0\colon F = G}\), this function derives p-values under the correct null hypothesis \(\mathrm{H_0\colon AUC} = 0.5\) that WMW actually tests. P-values are computed using asymptotic distribution theory with two methods of finite-sample bias corrections:

  1. Exact Unbiased ('EU') estimation of variance of eAUC which handles any type of data (continuous, discrete, mixed);

  2. Bias Correction ('BC') sample-size dependent method to maintain proper Type I error control. Confidence intervals for the pseudomedian are obtained by inverting the test.

References

Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80-83.

Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 18(1), 50-60.

Van Dantzig, D. (1951). On the consistency and the power of Wilcoxon's two sample test. Proceedings KNAW, Series A, 54(1), 1-8.

Birnbaum, Z. W. (1956). On a use of the Mann-Whitney statistic. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics (Vol. 3, pp. 13-18). University of California Press.

Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of mathematical psychology, 12(4), 387-415.

Lehmann, E. L., & Abrera, H. B. D. (1975). Nonparametrics. Statistical methods based on ranks. San Francisco, CA, Holden-Day.

Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29-36.

Cliff, N. (1993). Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological bulletin, 114(3), 494.

Arcones, M. A., Kvam, P. H., & Samaniego, F. J. (2002). Nonparametric estimation of a distribution subject to a stochastic precedence constraint. Journal of the American Statistical Association, 97(457), 170-182.

Pepe, M. S. (2003). The statistical evaluation of medical tests for classification and prediction. Oxford university press.

Conroy, R. M. (2012). What hypotheses do “nonparametric” two-group tests actually test?. The Stata Journal, 12(2), 182-190.

del Barrio, E., Cuesta-Albertos, J. A., & Matrán, C. (2025). Invariant measures of disagreement with stochastic dominance. The American Statistician, 1-13.

Grendar, M. (2025). Wilcoxon-Mann-Whitney test of no group discrimination. arXiv:2511.20308.

See Also

print.wmw_test for formated output of wmw_test(). plot.wmw_test for plot of output of wmw_test(). wmw_pvalue for details on computing p-values in the continuous case ('BC') wmw_pvalue_ties for details on computing p-values in the 'EU' mode pseudomedian_ci for details on computing confidence intervals for pseudomedian quadruplot for exploratory data analysis plots that assist in evaluating location-shift assumption validity. wilcox.test for Wilcoxon-Mann-Whitney test in base R.

Examples

Run this code
library('wmwAUC')  
# Ex 1
# \donttest{
library('gemR')
data(MS)
da <- MS
# preparing data frame
class(da$proteins) <- setdiff(class(da$proteins), "AsIs")
df <- as.data.frame(da$proteins)
df$MS <- da$MS
# WMW test 
wmd <- wmw_test(P19099 ~ MS, data = df, ref_level = 'no')
wmd
plot(wmd)
# EDA to assess location shift assumption validity
qp <- quadruplot(P19099 ~ MS, data = df, ref_level = 'no')
qp
# => location shift assumption is not valid
# }

# Ex 2
# \donttest{
data(Ex2)
da <- Ex2
# WMW test
wmd <- wmw_test(y ~ group, data = da, ref_level = 'control')
wmd
plot(wmd)
# Check location-shift assumption with EDA
qp <- quadruplot(y ~ group, data = da, ref_level = 'control', test = 'ks')
qp
# => location-shift assumption not tenable
# Note that medians are essentially the same:
median(da$y[da$group == 'case'])
# 0.495
median(da$y[da$group == 'control'])
# 0.493
# Erroneous use of location-shift special case would falsely 
# conclude significant median difference despite identical medians
wml <- wmw_test(y ~ group, data = da, special_case = TRUE,
                ref_level = 'control')
wml
# }

# Ex 3
# \donttest{
library('gss')
data(wesdr)
da = wesdr
da$ret = as.factor(da$ret)
# WMW 
wmd <- wmw_test(bmi ~ ret, data = da, ref_level = '0')
wmd
plot(wmd)
# EDA to assess location shift assumption validity
qp <- quadruplot(bmi ~ ret, data = da, ref_level = '0')
qp
# => location shift assumption is tenable
# Special case of WMW test
wml <- wmw_test(bmi ~ ret, data = da, ref_level = '0', 
                ci_method = 'boot', special_case = TRUE)
wml
plot(wml)
# } 


Run the code above in your browser using DataLab