rosnerTest: Rosner's Test for Outliers

Description

Perform Rosner's generalized extreme Studentized deviate test for up to $k$ potential outliers in a dataset, assuming the data without any outliers come from a normal (Gaussian) distribution.

Usage

rosnerTest(x, k = 3, alpha = 0.05, warn = TRUE)

Arguments

numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed. There must be at least 10 non-missing, finite observations in x.

positive integer indicating the number of suspected outliers. The argument k must be between 1 and $n-2$ where $n$ denotes the number of non-missing, finite values in the arguemnt x. The default value is k=3.

alpha

numeric scalar between 0 and 1 indicating the Type I error associated with the test of hypothesis. The default value is alpha=0.05.

warn

logical scalar indicating whether to issue a warning (warn=TRUE; the default) when the number of non-missing, finite values in x and the value of k are such that the assumed Type I error level might not be maintained. See the DETAILS section below.

Value

A list of class "gofOutlier" containing the results of the hypothesis test. See the help file for gofOutlier.object for details.

Details

Let $x_1, x_2, \ldots, x_n$ denote the $n$ observations. We assume that $n-k$ of these observations come from the same normal (Gaussian) distribution, and that the $k$ most “extreme” observations may or may not represent observations from a different distribution. Let $x^{*}_1, x^{*}_2, \ldots, x^{*}_{n-i}$ denote the $n-i$ observations left after omiting the $i$ most extreme observations, where $i = 0, 1, \ldots, k-1$. Let $\bar{x}^{(i)}$ and $s^{(i)}$ denote the mean and standard deviation, respectively, of the $n-i$ observations in the data that remain after removing the $i$ most extreme observations. Thus, $\bar{x}^{(0)}$ and $s^{(0)}$ denote the mean and standard deviation for the full sample, and in general $$\bar{x}^{(i)} = \frac{1}{n-i}\sum_{j=1}^{n-i} x^{*}_j \;\;\;\;\;\; (1)$$ $$s^{(i)} = \sqrt{\frac{1}{n-i-1} \sum_{j=1}^{n-i} (x^{*}_j - \bar{x}^{(i)})^2} \;\;\;\;\;\; (2)$$

For a specified value of $i$, the most extreme observation $x^{(i)}$ is the one that is the greatest distance from the mean for that data set, i.e., $$x^{(i)} = \max_{j=1,2,\ldots,n-i} |x^{*}_j - \bar{x}^{(i)}| \;\;\;\;\;\; (3)$$ Thus, an extreme observation may be the smallest or the largest one in that data set.

Rosner's test is based on the $k$ statistics $R_1, R_2, \ldots, R_k$, which represent the extreme Studentized deviates computed from successively reduced samples of size $n, n-1, \ldots, n-k+1$: $$R_{i+1} = \frac{|x^{(i)} - \bar{x}^{(i)}|}{s^{(i)}} \;\;\;\;\;\; (4)$$ Critical values for $R_{i+1}$ are denoted $\lambda_{i+1}$ and are computed as: $$\lambda_{i+1} = \frac{t_{p, n-i-2} (n-i-1)}{\sqrt{(n-i-2 + t_{p, n-i-2}) (n-i)}} \;\;\;\;\;\; (5)$$ where $t_{p, \nu}$ denotes the $p$'th quantile of Student's t-distribution with $\nu$ degrees of freedom, and in this case $$p = 1 - \frac{\alpha/2}{n - i} \;\;\;\;\;\; (6)$$ where $\alpha$ denotes the Type I error level.

The algorithm for determining the number of outliers is as follows:

Compare $R_k$ with $\lambda_k$. If $R_k > \lambda_k$ then conclude the $k$ most extreme values are outliers.
If $R_k \le \lambda_k$ then compare $R_{k-1}$ with $\lambda_{k-1}$. If $R_{k-1} > \lambda_{k-1}$ then conclude the $k-1$ most extreme values are outliers.
Continue in this fashion until a certain number of outliers have been identified or Rosner's test finds no outliers at all.

Based on a study using N=1,000 simulations, Rosner's (1983) Table 1 shows the estimated true Type I error of declaring at least one outlier when none exists for various sample sizes $n$ ranging from 10 to 100, and the declared maximum number of outliers $k$ ranging from 1 to 10. Based on that table, Roser (1983) declared that for an assumed Type I error level of 0.05, as long as $n \ge 25$, the estimated $\alpha$ levels are quite close to 0.05, and that similar results were obtained assuming a Type I error level of 0.01. However, the table below is an expanded version of Rosner's (1983) Table 1 and shows results based on N=10,000 simulations. You can see that for an assumed Type I error of 0.05, the test maintains the Type I error fairly well for sample sizes as small as $n = 3$ as long as $k = 1$, and for $n \ge 15$, as long as $k \le 2$. Also, for an assumed Type I error of 0.01, the test maintains the Type I error fairly well for sample sizes as small as $n = 15$ as long as $k \le 7$.

Based on these results, when warn=TRUE, a warning is issued for the following cases indicating that the assumed Type I error may not be correct:

alpha is greater than 0.01, the sample size is less than 15, and k is greater than 1.
alpha is greater than 0.01, the sample size is at least 15 and less than 25, and k is greater than 2.
alpha is less than or equal to 0.01, the sample size is less than 15, and k is greater than 1.
k is greater than 10, or greater than the floor of half of the sample size (i.e., greater than the greatest integer less than or equal to half of the sample size). A warning is given for this case because simulations have not been done for this case.

Table 1a. Observed Type I Error Levels based on 10,000 Simulations, $n =$ 3 to 5.

		Assumed	$\alpha=0.05$		Assumed	$\alpha=0.01$
$n$	$k$	$\hat{\alpha}$	95% LCL	95% UCL	$\hat{\alpha}$	95% LCL	95% UCL
3	1	0.047	0.043	0.051	0.009	0.007	0.01
4	1	0.049	0.045	0.053	0.010	0.008	0.012
	2	0.107	0.101	0.113	0.021	0.018	0.024
5	1	0.048	0.044	0.053	0.008	0.006	0.009

Table 1b. Observed Type I Error Levels based on 10,000 Simulations, $n =$ 6 to 10.

		Assumed	$\alpha=0.05$		Assumed	$\alpha=0.01$
$n$	$k$	$\hat{\alpha}$	95% LCL	95% UCL	$\hat{\alpha}$	95% LCL	95% UCL
6	1	0.048	0.044	0.053	0.010	0.009	0.012
	2	0.085	0.080	0.091	0.017	0.015	0.020
	3	0.141	0.134	0.148	0.028	0.025	0.031
7	1	0.048	0.044	0.053	0.013	0.011	0.015
	2	0.080	0.075	0.086	0.017	0.015	0.020
	3	0.112	0.106	0.118	0.022	0.019	0.025
8	1	0.048	0.044	0.053	0.011	0.009	0.013
	2	0.080	0.074	0.085	0.017	0.014	0.019
	3	0.102	0.096	0.108	0.020	0.017	0.023
	4	0.143	0.136	0.150	0.028	0.025	0.031
9	1	0.052	0.048	0.057	0.010	0.008	0.012
	2	0.069	0.064	0.074	0.014	0.012	0.016
	3	0.097	0.091	0.103	0.018	0.015	0.021
	4	0.120	0.114	0.126	0.024	0.021	0.027
10	1	0.051	0.047	0.056	0.010	0.008	0.012
	2	0.068	0.063	0.073	0.012	0.010	0.014
	3	0.085	0.080	0.091	0.015	0.013	0.017
	4	0.106	0.100	0.112	0.021	0.018	0.024

Table 1c. Observed Type I Error Levels based on 10,000 Simulations, $n =$ 11 to 15.

		Assumed	$\alpha=0.05$		Assumed	$\alpha=0.01$
$n$	$k$	$\hat{\alpha}$	95% LCL	95% UCL	$\hat{\alpha}$	95% LCL	95% UCL
11	1	0.052	0.048	0.056	0.012	0.010	0.014
	2	0.070	0.065	0.075	0.014	0.012	0.017
	3	0.082	0.077	0.088	0.014	0.011	0.016
	4	0.101	0.095	0.107	0.019	0.016	0.021
	5	0.116	0.110	0.123	0.022	0.019	0.024
12	1	0.052	0.047	0.056	0.011	0.009	0.013
	2	0.067	0.062	0.072	0.011	0.009	0.013
	3	0.074	0.069	0.080	0.016	0.013	0.018
	4	0.088	0.082	0.093	0.016	0.014	0.019
	5	0.099	0.093	0.105	0.016	0.013	0.018
	6	0.117	0.111	0.123	0.021	0.018	0.023
13	1	0.048	0.044	0.052	0.010	0.008	0.012
	2	0.064	0.059	0.069	0.014	0.012	0.016
	3	0.070	0.065	0.075	0.013	0.011	0.015
	4	0.079	0.074	0.084	0.014	0.012	0.017
	5	0.088	0.083	0.094	0.015	0.013	0.018
	6	0.109	0.103	0.115	0.020	0.017	0.022
14	1	0.046	0.042	0.051	0.009	0.007	0.011
	2	0.062	0.057	0.066	0.012	0.010	0.014
	3	0.069	0.064	0.074	0.012	0.010	0.014
	4	0.077	0.072	0.082	0.015	0.013	0.018
	5	0.084	0.079	0.090	0.016	0.013	0.018
	6	0.091	0.085	0.097	0.017	0.014	0.019
	7	0.107	0.101	0.113	0.018	0.016	0.021
15	1	0.054	0.050	0.059	0.010	0.008	0.012
	2	0.057	0.053	0.062	0.010	0.008	0.012
	3	0.065	0.060	0.069	0.013	0.011	0.016
	4	0.073	0.068	0.078	0.014	0.011	0.016
	5	0.074	0.069	0.079	0.012	0.010	0.014
	6	0.086	0.081	0.092	0.015	0.013	0.017

Table 1d. Observed Type I Error Levels based on 10,000 Simulations, $n =$ 16 to 20.

		Assumed	$\alpha=0.05$		Assumed	$\alpha=0.01$
$n$	$k$	$\hat{\alpha}$	95% LCL	95% UCL	$\hat{\alpha}$	95% LCL	95% UCL
16	1	0.052	0.048	0.057	0.010	0.008	0.012
	2	0.055	0.051	0.059	0.011	0.009	0.013
	3	0.068	0.063	0.073	0.011	0.009	0.013
	4	0.074	0.069	0.079	0.015	0.013	0.017
	5	0.077	0.072	0.082	0.015	0.013	0.018
	6	0.075	0.070	0.080	0.013	0.011	0.016
	7	0.087	0.082	0.093	0.017	0.014	0.020
	8	0.096	0.090	0.101	0.016	0.014	0.019
17	1	0.047	0.043	0.051	0.008	0.007	0.010
	2	0.059	0.054	0.063	0.011	0.009	0.013
	3	0.062	0.057	0.067	0.012	0.010	0.014
	4	0.070	0.065	0.075	0.012	0.009	0.014
	5	0.069	0.064	0.074	0.012	0.010	0.015
	6	0.071	0.066	0.076	0.015	0.012	0.017
	7	0.081	0.076	0.087	0.014	0.012	0.016
	8	0.083	0.078	0.088	0.015	0.013	0.017
18	1	0.051	0.047	0.055	0.010	0.008	0.012
	2	0.056	0.052	0.061	0.012	0.010	0.014
	3	0.065	0.060	0.070	0.012	0.010	0.015
	4	0.065	0.060	0.070	0.013	0.011	0.015
	5	0.069	0.064	0.074	0.012	0.010	0.014
	6	0.068	0.063	0.073	0.014	0.011	0.016
	7	0.072	0.067	0.077	0.014	0.011	0.016
	8	0.076	0.071	0.081	0.012	0.010	0.014
	9	0.081	0.076	0.086	0.012	0.010	0.014
19	1	0.051	0.046	0.055	0.008	0.006	0.010
	2	0.059	0.055	0.064	0.012	0.010	0.014
	3	0.059	0.054	0.064	0.011	0.009	0.013
	4	0.061	0.057	0.066	0.012	0.010	0.014
	5	0.067	0.062	0.072	0.013	0.010	0.015
	6	0.066	0.061	0.071	0.011	0.009	0.013
	7	0.069	0.064	0.074	0.013	0.011	0.015
	8	0.074	0.069	0.079	0.012	0.010	0.014
	9	0.082	0.077	0.087	0.015	0.013	0.018
20	1	0.053	0.048	0.057	0.011	0.009	0.013
	2	0.056	0.052	0.061	0.010	0.008	0.012
	3	0.060	0.056	0.065	0.009	0.007	0.011
	4	0.063	0.058	0.068	0.012	0.010	0.014
	5	0.063	0.059	0.068	0.014	0.011	0.016
	6	0.063	0.058	0.067	0.011	0.009	0.013
	7	0.065	0.061	0.070	0.011	0.009	0.013
	8	0.070	0.065	0.076	0.012	0.010	0.014
	9	0.076	0.070	0.081	0.013	0.011	0.015

Table 1e. Observed Type I Error Levels based on 10,000 Simulations, $n =$ 21 to 25.

		Assumed	$\alpha=0.05$		Assumed	$\alpha=0.01$
$n$	$k$	$\hat{\alpha}$	95% LCL	95% UCL	$\hat{\alpha}$	95% LCL	95% UCL
21	1	0.054	0.049	0.058	0.013	0.011	0.015
	2	0.054	0.049	0.058	0.012	0.010	0.014
	3	0.058	0.054	0.063	0.012	0.010	0.014
	4	0.058	0.054	0.063	0.011	0.009	0.013
	5	0.064	0.059	0.069	0.013	0.011	0.016
	6	0.066	0.061	0.071	0.012	0.010	0.015
	7	0.063	0.058	0.068	0.013	0.011	0.015
	8	0.066	0.061	0.071	0.010	0.008	0.012
	9	0.073	0.068	0.078	0.013	0.011	0.015
	10	0.071	0.066	0.076	0.012	0.010	0.014
22	1	0.047	0.042	0.051	0.010	0.008	0.012
	2	0.058	0.053	0.062	0.012	0.010	0.015
	3	0.056	0.052	0.061	0.010	0.008	0.012
	4	0.059	0.055	0.064	0.012	0.010	0.014
	5	0.061	0.057	0.066	0.009	0.008	0.011
	6	0.063	0.058	0.068	0.013	0.010	0.015
	7	0.065	0.060	0.070	0.013	0.010	0.015
	8	0.065	0.060	0.070	0.014	0.012	0.016
	9	0.065	0.060	0.070	0.012	0.010	0.014
	10	0.067	0.062	0.072	0.012	0.009	0.014
23	1	0.051	0.047	0.056	0.008	0.007	0.010
	2	0.056	0.052	0.061	0.010	0.009	0.012
	3	0.056	0.052	0.061	0.011	0.009	0.013
	4	0.062	0.057	0.066	0.011	0.009	0.013
	5	0.061	0.056	0.065	0.010	0.009	0.012
	6	0.060	0.055	0.064	0.012	0.010	0.014
	7	0.062	0.057	0.066	0.011	0.009	0.013
	8	0.063	0.058	0.068	0.012	0.010	0.014
	9	0.066	0.061	0.071	0.012	0.010	0.014
	10	0.068	0.063	0.073	0.014	0.012	0.017
24	1	0.051	0.046	0.055	0.010	0.008	0.012
	2	0.056	0.051	0.060	0.011	0.009	0.013
	3	0.058	0.053	0.062	0.010	0.008	0.012
	4	0.060	0.056	0.065	0.013	0.011	0.015
	5	0.057	0.053	0.062	0.012	0.010	0.014
	6	0.065	0.060	0.069	0.011	0.009	0.013
	7	0.062	0.057	0.066	0.012	0.010	0.014
	8	0.060	0.055	0.065	0.012	0.010	0.014
	9	0.066	0.061	0.071	0.012	0.010	0.014
	10	0.064	0.059	0.068	0.012	0.010	0.015
25	1	0.054	0.050	0.059	0.012	0.009	0.014
	2	0.055	0.051	0.060	0.010	0.008	0.012
	3	0.057	0.052	0.062	0.011	0.009	0.013
	4	0.055	0.051	0.060	0.011	0.009	0.013
	5	0.060	0.055	0.065	0.012	0.010	0.014
	6	0.060	0.055	0.064	0.011	0.009	0.013
	7	0.057	0.052	0.061	0.011	0.009	0.013
	8	0.062	0.058	0.067	0.011	0.009	0.013
	9	0.058	0.053	0.062	0.012	0.010	0.014

Table 1f. Observed Type I Error Levels based on 10,000 Simulations, $n =$ 26 to 30.

		Assumed	$\alpha=0.05$		Assumed	$\alpha=0.01$
$n$	$k$	$\hat{\alpha}$	95% LCL	95% UCL	$\hat{\alpha}$	95% LCL	95% UCL
26	1	0.051	0.047	0.055	0.012	0.010	0.014
	2	0.057	0.053	0.062	0.013	0.011	0.015
	3	0.055	0.050	0.059	0.012	0.010	0.014
	4	0.055	0.051	0.060	0.010	0.008	0.012
	5	0.058	0.054	0.063	0.011	0.009	0.013
	6	0.061	0.056	0.066	0.012	0.010	0.014
	7	0.059	0.054	0.064	0.011	0.009	0.013
	8	0.060	0.056	0.065	0.010	0.008	0.012
	9	0.060	0.056	0.065	0.011	0.009	0.013
	10	0.061	0.056	0.065	0.011	0.009	0.013
27	1	0.050	0.046	0.054	0.009	0.007	0.011
	2	0.054	0.050	0.059	0.011	0.009	0.013
	3	0.062	0.057	0.066	0.012	0.010	0.014
	4	0.063	0.058	0.068	0.011	0.009	0.013
	5	0.051	0.047	0.055	0.010	0.008	0.012
	6	0.058	0.053	0.062	0.011	0.009	0.013
	7	0.060	0.056	0.065	0.010	0.008	0.012
	8	0.056	0.052	0.061	0.010	0.008	0.012
	9	0.061	0.056	0.066	0.012	0.010	0.014
	10	0.055	0.051	0.060	0.008	0.006	0.010
28	1	0.049	0.045	0.053	0.010	0.008	0.011
	2	0.057	0.052	0.061	0.011	0.009	0.013
	3	0.056	0.052	0.061	0.012	0.009	0.014
	4	0.057	0.053	0.062	0.011	0.009	0.013
	5	0.057	0.053	0.062	0.010	0.008	0.012
	6	0.056	0.051	0.060	0.010	0.008	0.012
	7	0.057	0.052	0.061	0.010	0.008	0.012
	8	0.058	0.054	0.063	0.011	0.009	0.013
	9	0.054	0.050	0.058	0.011	0.009	0.013
	10	0.062	0.057	0.067	0.011	0.009	0.013
29	1	0.049	0.045	0.053	0.011	0.009	0.013
	2	0.053	0.048	0.057	0.010	0.008	0.012
	3	0.056	0.051	0.060	0.010	0.009	0.012
	4	0.055	0.050	0.059	0.010	0.008	0.012
	5	0.056	0.051	0.060	0.010	0.008	0.012
	6	0.057	0.053	0.062	0.012	0.010	0.014
	7	0.055	0.050	0.059	0.010	0.008	0.012
	8	0.057	0.052	0.061	0.011	0.009	0.013
	9	0.056	0.051	0.061	0.011	0.009	0.013
	10	0.057	0.052	0.061	0.011	0.009	0.013
30	1	0.050	0.046	0.054	0.009	0.007	0.011
	2	0.054	0.049	0.058	0.011	0.009	0.013
	3	0.056	0.052	0.061	0.012	0.010	0.015
	4	0.054	0.049	0.058	0.010	0.008	0.012
	5	0.058	0.053	0.063	0.012	0.010	0.014
	6	0.062	0.058	0.067	0.012	0.010	0.014
	7	0.056	0.052	0.061	0.012	0.010	0.014
	8	0.059	0.054	0.064	0.011	0.009	0.013
	9	0.056	0.052	0.061	0.010	0.009	0.012

Table 1g. Observed Type I Error Levels based on 10,000 Simulations, n = 31 to 35.

		Assumed	$\alpha=0.05$		Assumed	$\alpha=0.01$
$n$	$k$	$\hat{\alpha}$	95% LCL	95% UCL	$\hat{\alpha}$	95% LCL	95% UCL
31	1	0.051	0.047	0.056	0.009	0.007	0.011
	2	0.054	0.050	0.059	0.010	0.009	0.012
	3	0.053	0.049	0.058	0.010	0.008	0.012
	4	0.055	0.050	0.059	0.010	0.008	0.012
	5	0.053	0.049	0.057	0.011	0.009	0.013
	6	0.055	0.050	0.059	0.010	0.008	0.012
	7	0.055	0.050	0.059	0.012	0.010	0.014
	8	0.056	0.051	0.060	0.010	0.008	0.012
	9	0.057	0.053	0.062	0.011	0.009	0.013
	10	0.058	0.053	0.062	0.011	0.009	0.013
32	1	0.054	0.049	0.058	0.010	0.008	0.012
	2	0.054	0.050	0.059	0.010	0.008	0.012
	3	0.052	0.047	0.056	0.009	0.007	0.011
	4	0.056	0.051	0.060	0.011	0.009	0.013
	5	0.056	0.052	0.061	0.011	0.009	0.013
	6	0.055	0.051	0.060	0.011	0.009	0.013
	7	0.055	0.051	0.060	0.010	0.008	0.012
	8	0.055	0.051	0.060	0.010	0.008	0.012
	9	0.057	0.053	0.062	0.012	0.010	0.014
	10	0.054	0.050	0.059	0.010	0.008	0.012
33	1	0.051	0.046	0.055	0.011	0.009	0.013
	2	0.055	0.051	0.060	0.011	0.009	0.013
	3	0.056	0.052	0.061	0.010	0.008	0.012
	4	0.052	0.048	0.057	0.010	0.008	0.012
	5	0.055	0.050	0.059	0.010	0.008	0.012
	6	0.058	0.053	0.062	0.011	0.009	0.013
	7	0.057	0.052	0.061	0.010	0.008	0.012
	8	0.058	0.054	0.063	0.011	0.009	0.013
	9	0.057	0.053	0.062	0.012	0.010	0.014
	10	0.055	0.051	0.060	0.011	0.009	0.013
34	1	0.052	0.048	0.056	0.009	0.007	0.011
	2	0.053	0.049	0.058	0.011	0.009	0.013
	3	0.055	0.050	0.059	0.012	0.010	0.014
	4	0.056	0.052	0.061	0.010	0.008	0.012
	5	0.053	0.048	0.057	0.009	0.007	0.011
	6	0.055	0.050	0.059	0.010	0.008	0.012
	7	0.052	0.048	0.057	0.012	0.010	0.014
	8	0.055	0.050	0.059	0.009	0.008	0.011
	9	0.055	0.051	0.060	0.011	0.009	0.013
	10	0.054	0.049	0.058	0.010	0.008	0.012
35	1	0.051	0.046	0.055	0.010	0.009	0.012
	2	0.054	0.049	0.058	0.010	0.009	0.012
	3	0.055	0.050	0.059	0.010	0.009	0.012
	4	0.053	0.048	0.057	0.011	0.009	0.013
	5	0.056	0.051	0.061	0.011	0.009	0.013
	6	0.055	0.051	0.059	0.012	0.010	0.014
	7	0.054	0.050	0.059	0.011	0.009	0.013
	8	0.054	0.049	0.058	0.011	0.009	0.013
	9	0.061	0.056	0.066	0.012	0.010	0.014

Table 1h. Observed Type I Error Levels based on 10,000 Simulations, n = 36 to 40.

		Assumed	$\alpha=0.05$		Assumed	$\alpha=0.01$
$n$	$k$	$\hat{\alpha}$	95% LCL	95% UCL	$\hat{\alpha}$	95% LCL	95% UCL
36	1	0.047	0.043	0.051	0.010	0.008	0.012
	2	0.058	0.053	0.062	0.012	0.010	0.015
	3	0.052	0.047	0.056	0.009	0.007	0.011
	4	0.052	0.048	0.056	0.012	0.010	0.014
	5	0.052	0.048	0.057	0.010	0.008	0.012
	6	0.055	0.051	0.059	0.012	0.010	0.014
	7	0.053	0.048	0.057	0.011	0.009	0.013
	8	0.056	0.051	0.060	0.012	0.010	0.014
	9	0.056	0.051	0.060	0.011	0.009	0.013
	10	0.056	0.051	0.060	0.011	0.009	0.013
37	1	0.050	0.046	0.055	0.010	0.008	0.012
	2	0.054	0.049	0.058	0.011	0.009	0.013
	3	0.054	0.049	0.058	0.011	0.009	0.013
	4	0.054	0.050	0.058	0.010	0.008	0.012
	5	0.054	0.049	0.058	0.010	0.008	0.012
	6	0.054	0.050	0.058	0.011

References

Barnett, V., and T. Lewis. (1995). Outliers in Statistical Data. Third Edition. John Wiley & Sons, Chichester, UK, pp. 235--236.

Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, NY, pp.188--191.

McBean, E.A, and F.A. Rovers. (1992). Estimation of the Probability of Exceedance of Contaminant Concentrations. Ground Water Monitoring Review Winter, pp. 115--119.

McNutt, M. (2014). Raising the Bar. Science 345(6192), p. 9.

Rosner, B. (1975). On the Detection of Many Outliers. Technometrics 17, 221--227.

Rosner, B. (1983). Percentage Points for a Generalized ESD Many-Outlier Procedure. Technometrics 25, 165--172.

USEPA. (2006). Data Quality Assessment: A Reviewer's Guide. EPA QA/G-9R. EPA/240/B-06/002, February 2006. Office of Environmental Information, U.S. Environmental Protection Agency, Washington, D.C.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C., pp. 12-10 to 12-14.

USEPA. (2013a). ProUCL Version 5.0.00 Technical Guide. EPA/600/R-07/041, September 2013. Office of Research and Development. U.S. Environmental Protection Agency, Washington, D.C., pp. 190--195.

USEPA. (2013b). ProUCL Version 5.0.00 User Guide. EPA/600/R-07/041, September 2013. Office of Research and Development. U.S. Environmental Protection Agency, Washington, D.C., pp. 190--195.

Examples

Run this code

# NOT RUN {
  # Combine 30 observations from a normal distribution with mean 3 and 
  # standard deviation 2, with 3 observations from a normal distribution 
  # with mean 10 and standard deviation 1, then run Rosner's Test on these 
  # data, specifying k=4 potential outliers based on looking at the 
  # normal Q-Q plot. 
  # (Note: the call to set.seed simply allows you to reproduce 
  # this example.)

  set.seed(250) 

  dat <- c(rnorm(30, mean = 3, sd = 2), rnorm(3, mean = 10, sd = 1)) 

  dev.new()
  qqPlot(dat)

  rosnerTest(dat, k = 4)

  #Results of Outlier Test
  #-------------------------
  #
  #Test Method:                     Rosner's Test for Outliers
  #
  #Hypothesized Distribution:       Normal
  #
  #Data:                            dat
  #
  #Sample Size:                     33
  #
  #Test Statistics:                 R.1 = 2.848514
  #                                 R.2 = 3.086875
  #                                 R.3 = 3.033044
  #                                 R.4 = 2.380235
  #
  #Test Statistic Parameter:        k = 4
  #
  #Alternative Hypothesis:          Up to 4 observations are not
  #                                 from the same Distribution.
  #
  #Type I Error:                    5%
  #
  #Number of Outliers Detected:     3
  #
  #  i   Mean.i     SD.i      Value Obs.Num    R.i+1 lambda.i+1 Outlier
  #1 0 3.549744 2.531011 10.7593656      33 2.848514   2.951949    TRUE
  #2 1 3.324444 2.209872 10.1460427      31 3.086875   2.938048    TRUE
  #3 2 3.104392 1.856109  8.7340527      32 3.033044   2.923571    TRUE
  #4 3 2.916737 1.560335 -0.7972275      25 2.380235   2.908473   FALSE

  #----------
  # Clean up

  rm(dat)
  graphics.off()

  #--------------------------------------------------------------------

  # Example 12-4 of USEPA (2009, page 12-12) gives an example of 
  # using Rosner's test to test for outliers in napthalene measurements (ppb)
  # taken at 5 background wells over 5 quarters.  The data for this example 
  # are stored in EPA.09.Ex.12.4.naphthalene.df.

  EPA.09.Ex.12.4.naphthalene.df
  #   Quarter Well Naphthalene.ppb
  #1        1 BW.1            3.34
  #2        2 BW.1            5.39
  #3        3 BW.1            5.74
  # ...
  #23       3 BW.5            5.53
  #24       4 BW.5            4.42
  #25       5 BW.5           35.45

  longToWide(EPA.09.Ex.12.4.naphthalene.df, "Naphthalene.ppb", "Quarter", "Well", 
    paste.row.name = TRUE)
  #          BW.1 BW.2  BW.3 BW.4  BW.5
  #Quarter.1 3.34 5.59  1.91 6.12  8.64
  #Quarter.2 5.39 5.96  1.74 6.05  5.34
  #Quarter.3 5.74 1.47 23.23 5.18  5.53
  #Quarter.4 6.88 2.57  1.82 4.43  4.42
  #Quarter.5 5.85 5.39  2.02 1.00 35.45


  # Look at Q-Q plots for both the raw and log-transformed data
  #------------------------------------------------------------

  dev.new()
  with(EPA.09.Ex.12.4.naphthalene.df, 
    qqPlot(Naphthalene.ppb, add.line = TRUE, 
      main = "Figure 12-6.  Naphthalene Probability Plot"))

  dev.new()
  with(EPA.09.Ex.12.4.naphthalene.df, 
    qqPlot(Naphthalene.ppb, dist = "lnorm", add.line = TRUE, 
      main = "Figure 12-7.  Log Naphthalene Probability Plot"))


  # Test for 2 potential outliers on the original scale:
  #-----------------------------------------------------

  with(EPA.09.Ex.12.4.naphthalene.df, rosnerTest(Naphthalene.ppb, k = 2))

  #Results of Outlier Test
  #-------------------------
  #
  #Test Method:                     Rosner's Test for Outliers
  #
  #Hypothesized Distribution:       Normal
  #
  #Data:                            Naphthalene.ppb
  #
  #Sample Size:                     25
  #
  #Test Statistics:                 R.1 = 3.930957
  #                                 R.2 = 4.160223
  #
  #Test Statistic Parameter:        k = 2
  #
  #Alternative Hypothesis:          Up to 2 observations are not
  #                                 from the same Distribution.
  #
  #Type I Error:                    5%
  #
  #Number of Outliers Detected:     2
  #
  #  i  Mean.i     SD.i Value Obs.Num    R.i+1 lambda.i+1 Outlier
  #1 0 6.44240 7.379271 35.45      25 3.930957   2.821681    TRUE
  #2 1 5.23375 4.325790 23.23      13 4.160223   2.801551    TRUE

  #----------
  # Clean up

  graphics.off()
# }

Run the code above in your browser using DataLab

		Assumed	\(\alpha=0.05\)		Assumed	\(\alpha=0.01\)
\(n\)	\(k\)	\(\hat{\alpha}\)	95% LCL	95% UCL	\(\hat{\alpha}\)	95% LCL	95% UCL
3	1	0.047	0.043	0.051	0.009	0.007	0.01
4	1	0.049	0.045	0.053	0.010	0.008	0.012
	2	0.107	0.101	0.113	0.021	0.018	0.024
5	1	0.048	0.044	0.053	0.008	0.006	0.009