Let \(X\) denote an identically and independently distributed
normal variate. Further, let the increasingly ordered realizations
denote \(x_1 \le x_2 \le \ldots \le x_n\).
Dixon (1950) proposed the following ratio statistic to detect
an outlier (two sided):
$$
r_{j,i-1} = \max\left\{\frac{x_n - x_{n-j}}{x_n - x_i},
\frac{x_{1+j} - x_1}{x_{n-i} - x_1}\right\}$$
The null hypothesis, no outlier, is tested against the alternative,
at least one observation is an outlier (two sided). The subscript \(j\)
on the \(r\) symbol indicates the number of
outliers that are suspected at the upper end of the data set,
and the subscript \(i\) indicates the number of outliers suspected
at the lower end. For \(r_{10}\) it is also common to use the
statistic \(Q\).
The statistic for a single maximum outlier is:
$$
r_{j,i-1} = \left(x_n - x_{n-j} \right) / \left(x_n - x_i\right)$$
The null hypothesis is tested against the alternative,
the maximum observation is an outlier.
For testing a single minimum outlier, the test statistic is:
$$
r_{j,i-1} = \left(x_{1+j} - x_1 \right) / \left(x_{n-i} - x_1 \right)$$
The null hypothesis is tested against the alternative,
the minimum observation is an outlier.
Apart from the earlier Dixons Q-test (i.e. \(r_{10}\)),
a refined version that was later proposed by Dixon can be performed
with this function, where the statistic \(r_{j,i-1}\) depends on
the sample size as follows:
\(r_{10}\): |
\(3 \le n \le 7\) |
\(r_{11}\): |
\(8 \le n \le 10\) |
\(r_{21}\); |
\(11 \le n \le 13\) |
\(r_{22}\): |
\(14 \le n \le 30\) |
The p-value is computed with the function pdixon
.