# wilcox.test

##### Wilcoxon Rank Sum and Signed Rank Tests

Performs one- and two-sample Wilcoxon tests on vectors of data; the
latter is also known as

- Keywords
- htest

##### Usage

`wilcox.test(x, ...)`## S3 method for class 'default':
wilcox.test(x, y = NULL,
alternative = c("two.sided", "less", "greater"),
mu = 0, paired = FALSE, exact = NULL, correct = TRUE,
conf.int = FALSE, conf.level = 0.95, ...)

## S3 method for class 'formula':
wilcox.test(formula, data, subset, na.action, \dots)

##### Arguments

- x
- numeric vector of data values. Non-finite (e.g., infinite or missing) values will be omitted.
- y
- an optional numeric vector of data values: as with
`x`

non-finite values will be omitted. - alternative
- a character string specifying the alternative
hypothesis, must be one of
`"two.sided"`

(default),`"greater"`

or`"less"`

. You can specify just the initial letter. - mu
- a number specifying an optional parameter used to form the
null hypothesis. See
Details . - paired
- a logical indicating whether you want a paired test.
- exact
- a logical indicating whether an exact p-value should be computed.
- correct
- a logical indicating whether to apply continuity correction in the normal approximation for the p-value.
- conf.int
- a logical indicating whether a confidence interval should be computed.
- conf.level
- confidence level of the interval.
- formula
- a formula of the form
`lhs ~ rhs`

where`lhs`

is a numeric variable giving the data values and`rhs`

a factor with two levels giving the corresponding groups. - data
- an optional matrix or data frame (or similar: see
`model.frame`

) containing the variables in the formula`formula`

. By default the variables are taken from`environment(formula)`

. - subset
- an optional vector specifying a subset of observations to be used.
- na.action
- a function which indicates what should happen when
the data contain
`NA`

s. Defaults to`getOption("na.action")`

. - ...
- further arguments to be passed to or from methods.

##### Details

The formula interface is only applicable for the 2-sample tests.

If only `x`

is given, or if both `x`

and `y`

are given
and `paired`

is `TRUE`

, a Wilcoxon signed rank test of the
null that the distribution of `x`

(in the one sample case) or of
`x - y`

(in the paired two sample case) is symmetric about
`mu`

is performed.

Otherwise, if both `x`

and `y`

are given and `paired`

is `FALSE`

, a Wilcoxon rank sum test (equivalent to the
Mann-Whitney test: see the Note) is carried out. In this case, the
null hypothesis is that the distributions of `x`

and `y`

differ by a location shift of `mu`

and the alternative is that
they differ by some other location shift (and the one-sided
alternative `"greater"`

is that `x`

is shifted to the right
of `y`

).

By default (if `exact`

is not specified), an exact p-value
is computed if the samples contain less than 50 finite values and
there are no ties. Otherwise, a normal approximation is used.

Optionally (if argument `conf.int`

is true), a nonparametric
confidence interval and an estimator for the pseudomedian (one-sample
case) or for the difference of the location parameters `x-y`

is
computed. (The pseudomedian of a distribution $F$ is the median
of the distribution of $(u+v)/2$, where $u$ and $v$ are
independent, each with distribution $F$. If $F$ is symmetric,
then the pseudomedian and median coincide. See Hollander & Wolfe
(1973), page 34.) Note that in the two-sample case the estimator for
the difference in location parameters does **not** estimate the
difference in medians (a common misconception) but rather the median
of the difference between a sample from `x`

and a sample from
`y`

.

If exact p-values are available, an exact confidence interval is
obtained by the algorithm described in Bauer (1972), and the
Hodges-Lehmann estimator is employed. Otherwise, the returned
confidence interval and point estimate are based on normal
approximations. These are continuity-corrected for the interval but
*not* the estimate (as the correction depends on the
`alternative`

).

With small samples it may not be possible to achieve very high confidence interval coverages. If this happens a warning will be given and an interval with lower coverage will be substituted.

##### Value

- A list with class
`"htest"`

containing the following components: statistic the value of the test statistic with a name describing it. parameter the parameter(s) for the exact distribution of the test statistic. p.value the p-value for the test. null.value the location parameter `mu`

.alternative a character string describing the alternative hypothesis. method the type of test applied. data.name a character string giving the names of the data. conf.int a confidence interval for the location parameter. (Only present if argument `conf.int = TRUE`

.)estimate an estimate of the location parameter. (Only present if argument `conf.int = TRUE`

.)

##### Note

The literature is not unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney tests. The two most common definitions correspond to the sum of the ranks of the first sample with the minimum value subtracted or not: Rsubtracts and S-PLUS does not, giving a value which is larger by $m(m+1)/2$ for a first sample of size $m$. (It seems Wilcoxon's original paper used the unadjusted sum of the ranks but subsequent tables subtracted the minimum.)

R's value can also be computed as the number of all pairs
`(x[i], y[j])`

for which `y[j]`

is not greater than
`x[i]`

, the most common definition of the Mann-Whitney test.

##### concept

Mann-Whitney Test

##### Warning

This function can use large amounts of memory and stack (and even
crash Rif the stack limit is exceeded) if `exact = TRUE`

and
one sample is large (several thousands or more).

##### References

David F. Bauer (1972),
Constructing confidence sets using rank statistics.
*Journal of the American Statistical Association*
**67**, 687--690.

Myles Hollander and Douglas A. Wolfe (1973),
*Nonparametric Statistical Methods.*
New York: John Wiley & Sons.
Pages 27--33 (one-sample), 68--75 (two-sample).
Or second edition (1999).

##### See Also

`wilcox_test`

in package
*conditional* p-values, including in the presence of ties.

`kruskal.test`

for testing homogeneity in location
parameters in the case of two or more samples;
`t.test`

for an alternative under normality
assumptions [or large samples]

##### Examples

`library(stats)`

```
require(graphics)
## One-sample test.
## Hollander & Wolfe (1973), 29f.
## Hamilton depression scale factor measurements in 9 patients with
## mixed anxiety and depression, taken at the first (x) and second
## (y) visit after initiation of a therapy (administration of a
## tranquilizer).
x <- c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30)
y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29)
wilcox.test(x, y, paired = TRUE, alternative = "greater")
wilcox.test(y - x, alternative = "less") # The same.
wilcox.test(y - x, alternative = "less",
exact = FALSE, correct = FALSE) # H&W large sample
# approximation
## Two-sample test.
## Hollander & Wolfe (1973), 69f.
## Permeability constants of the human chorioamnion (a placental
## membrane) at term (x) and between 12 to 26 weeks gestational
## age (y). The alternative of interest is greater permeability
## of the human chorioamnion for the term pregnancy.
x <- c(0.80, 0.83, 1.89, 1.04, 1.45, 1.38, 1.91, 1.64, 0.73, 1.46)
y <- c(1.15, 0.88, 0.90, 0.74, 1.21)
wilcox.test(x, y, alternative = "g") # greater
wilcox.test(x, y, alternative = "greater",
exact = FALSE, correct = FALSE) # H&W large sample
# approximation
wilcox.test(rnorm(10), rnorm(10, 2), conf.int = TRUE)
## Formula interface.
boxplot(Ozone ~ Month, data = airquality)
wilcox.test(Ozone ~ Month, data = airquality,
subset = Month %in% c(5, 8))
```

*Documentation reproduced from package stats, version 3.3, License: Part of R 3.3*