Performs one- and two-sample Wilcoxon tests on vectors of data; the latter is also known as ‘Mann-Whitney’ test.

`wilcox.test(x, …)`# S3 method for default
wilcox.test(x, y = NULL,
alternative = c("two.sided", "less", "greater"),
mu = 0, paired = FALSE, exact = NULL, correct = TRUE,
conf.int = FALSE, conf.level = 0.95, …)

# S3 method for formula
wilcox.test(formula, data, subset, na.action, …)

x

numeric vector of data values. Non-finite (e.g., infinite or missing) values will be omitted.

y

an optional numeric vector of data values: as with `x`

non-finite values will be omitted.

alternative

a character string specifying the alternative
hypothesis, must be one of `"two.sided"`

(default),
`"greater"`

or `"less"`

. You can specify just the initial
letter.

mu

a number specifying an optional parameter used to form the null hypothesis. See ‘Details’.

paired

a logical indicating whether you want a paired test.

exact

a logical indicating whether an exact p-value should be computed.

correct

a logical indicating whether to apply continuity correction in the normal approximation for the p-value.

conf.int

a logical indicating whether a confidence interval should be computed.

conf.level

confidence level of the interval.

formula

a formula of the form `lhs ~ rhs`

where `lhs`

is a numeric variable giving the data values and `rhs`

a factor
with two levels giving the corresponding groups.

data

an optional matrix or data frame (or similar: see
`model.frame`

) containing the variables in the
formula `formula`

. By default the variables are taken from
`environment(formula)`

.

subset

an optional vector specifying a subset of observations to be used.

na.action

a function which indicates what should happen when
the data contain `NA`

s. Defaults to
`getOption("na.action")`

.

…

further arguments to be passed to or from methods.

A list with class `"htest"`

containing the following components:

the value of the test statistic with a name describing it.

the parameter(s) for the exact distribution of the test statistic.

the p-value for the test.

the location parameter `mu`

.

a character string describing the alternative hypothesis.

the type of test applied.

a character string giving the names of the data.

a confidence interval for the location parameter.
(Only present if argument `conf.int = TRUE`

.)

an estimate of the location parameter.
(Only present if argument `conf.int = TRUE`

.)

This function can use large amounts of memory and stack (and even
crash R if the stack limit is exceeded) if `exact = TRUE`

and
one sample is large (several thousands or more).

The formula interface is only applicable for the 2-sample tests.

If only `x`

is given, or if both `x`

and `y`

are given
and `paired`

is `TRUE`

, a Wilcoxon signed rank test of the
null that the distribution of `x`

(in the one sample case) or of
`x - y`

(in the paired two sample case) is symmetric about
`mu`

is performed.

Otherwise, if both `x`

and `y`

are given and `paired`

is `FALSE`

, a Wilcoxon rank sum test (equivalent to the
Mann-Whitney test: see the Note) is carried out. In this case, the
null hypothesis is that the distributions of `x`

and `y`

differ by a location shift of `mu`

and the alternative is that
they differ by some other location shift (and the one-sided
alternative `"greater"`

is that `x`

is shifted to the right
of `y`

).

By default (if `exact`

is not specified), an exact p-value
is computed if the samples contain less than 50 finite values and
there are no ties. Otherwise, a normal approximation is used.

Optionally (if argument `conf.int`

is true), a nonparametric
confidence interval and an estimator for the pseudomedian (one-sample
case) or for the difference of the location parameters `x-y`

is
computed. (The pseudomedian of a distribution \(F\) is the median
of the distribution of \((u+v)/2\), where \(u\) and \(v\) are
independent, each with distribution \(F\). If \(F\) is symmetric,
then the pseudomedian and median coincide. See Hollander & Wolfe
(1973), page 34.) Note that in the two-sample case the estimator for
the difference in location parameters does **not** estimate the
difference in medians (a common misconception) but rather the median
of the difference between a sample from `x`

and a sample from
`y`

.

If exact p-values are available, an exact confidence interval is
obtained by the algorithm described in Bauer (1972), and the
Hodges-Lehmann estimator is employed. Otherwise, the returned
confidence interval and point estimate are based on normal
approximations. These are continuity-corrected for the interval but
*not* the estimate (as the correction depends on the
`alternative`

).

With small samples it may not be possible to achieve very high confidence interval coverages. If this happens a warning will be given and an interval with lower coverage will be substituted.

When `x`

(and `y`

if applicable) are valid, the function now
always returns, also in the `conf.int = TRUE`

case when a
confidence interval cannot be computed, in which case the interval
boundaries and sometimes the `estimate`

now contain
`NaN`

.

David F. Bauer (1972).
Constructing confidence sets using rank statistics.
*Journal of the American Statistical Association*
**67**, 687--690.
10.1080/01621459.1972.10481279.

Myles Hollander and Douglas A. Wolfe (1973).
*Nonparametric Statistical Methods*.
New York: John Wiley & Sons.
Pages 27--33 (one-sample), 68--75 (two-sample).
Or second edition (1999).

`wilcox_test`

in package
coin for exact, asymptotic and Monte Carlo
*conditional* p-values, including in the presence of ties.

`kruskal.test`

for testing homogeneity in location
parameters in the case of two or more samples;
`t.test`

for an alternative under normality
assumptions [or large samples]

# NOT RUN { require(graphics) ## One-sample test. ## Hollander & Wolfe (1973), 29f. ## Hamilton depression scale factor measurements in 9 patients with ## mixed anxiety and depression, taken at the first (x) and second ## (y) visit after initiation of a therapy (administration of a ## tranquilizer). x <- c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30) y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29) wilcox.test(x, y, paired = TRUE, alternative = "greater") wilcox.test(y - x, alternative = "less") # The same. wilcox.test(y - x, alternative = "less", exact = FALSE, correct = FALSE) # H&W large sample # approximation ## Two-sample test. ## Hollander & Wolfe (1973), 69f. ## Permeability constants of the human chorioamnion (a placental ## membrane) at term (x) and between 12 to 26 weeks gestational ## age (y). The alternative of interest is greater permeability ## of the human chorioamnion for the term pregnancy. x <- c(0.80, 0.83, 1.89, 1.04, 1.45, 1.38, 1.91, 1.64, 0.73, 1.46) y <- c(1.15, 0.88, 0.90, 0.74, 1.21) wilcox.test(x, y, alternative = "g") # greater wilcox.test(x, y, alternative = "greater", exact = FALSE, correct = FALSE) # H&W large sample # approximation wilcox.test(rnorm(10), rnorm(10, 2), conf.int = TRUE) ## Formula interface. boxplot(Ozone ~ Month, data = airquality) wilcox.test(Ozone ~ Month, data = airquality, subset = Month %in% c(5, 8)) # }