rosnerTest: Rosner's Test for Outliers

Description

Perform Rosner's generalized extreme Studentized deviate test for up to $k$ potential outliers in a dataset, assuming the data without any outliers come from a normal (Gaussian) distribution.

Usage

rosnerTest(x, k = 3, alpha = 0.05, warn = TRUE)

Arguments

numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed. There must be at least 10 non-missing, finite observations in

positive integer indicating the number of suspected outliers. The argument k must be between 1 and $n-2$ where $n$ denotes the number of non-missing, finite values in the arguemnt x. The default value is k=3

alpha

numeric scalar between 0 and 1 indicating the Type I error associated with the test of hypothesis. The default value is alpha=0.05.

warn

logical scalar indicating whether to issue a warning (warn=TRUE; the default) when the number of non-missing, finite values in x and the value of k are such that the assumed Type I error level might not be ma

Value

A list of class "gofOutlier" containing the results of the hypothesis test. See the help file for gofOutlier.object for details.

Details

Let $x_1, x_2, \ldots, x_n$ denote the $n$ observations. We assume that $n-k$ of these observations come from the same normal (Gaussian) distribution, and that the $k$ most extreme observations may or may not represent observations from a different distribution. Let $x^{*}_1, x^{*}_2, \ldots, x^{*}_{n-i}$ denote the $n-i$ observations left after omiting the $i$ most extreme observations, where $i = 0, 1, \ldots, k-1$. Let $\bar{x}^{(i)}$ and $s^{(i)}$ denote the mean and standard deviation, respectively, of the $n-i$ observations in the data that remain after removing the $i$ most extreme observations. Thus, $\bar{x}^{(0)}$ and $s^{(0)}$ denote the mean and standard deviation for the full sample, and in general $$\bar{x}^{(i)} = \frac{1}{n-i}\sum_{j=1}^{n-i} x^{*}_j \;\;\;\;\;\; (1)$$ $$s^{(i)} = \sqrt{\frac{1}{n-i-1} \sum_{j=1}^{n-i} (x^{*}_j - \bar{x}^{(i)})^2} \;\;\;\;\;\; (2)$$ For a specified value of $i$, the most extreme observation $x^{(i)}$ is the one that is the greatest distance from the mean for that data set, i.e., $$x^{(i)} = \max_{j=1,2,\ldots,n-i} |x^{*}_j - \bar{x}^{(i)}| \;\;\;\;\;\; (3)$$ Thus, an extreme observation may be the smallest or the largest one in that data set. Rosner's test is based on the $k$ statistics $R_1, R_2, \ldots, R_k$, which represent the extreme Studentized deviates computed from successively reduced samples of size $n, n-1, \ldots, n-k+1$: $$R_{i+1} = \frac{|x^{(i)} - \bar{x}^{(i)}|}{s^{(i)}} \;\;\;\;\;\; (4)$$ Critical values for $R_{i+1}$ are denoted $\lambda_{i+1}$ and are computed as: $$\lambda_{i+1} = \frac{t_{p, n-i-2} (n-i-1)}{\sqrt{(n-i-2 + t_{p, n-i-2}) (n-i)}} \;\;\;\;\;\; (5)$$ where $t_{p, \nu}$ denotes the $p$'th quantile of Student's t-distribution with $\nu$ degrees of freedom, and in this case $$p = 1 - \frac{\alpha/2}{n - i} \;\;\;\;\;\; (6)$$ where $\alpha$ denotes the Type I error level. The algorithm for determining the number of outliers is as follows:

Compare$R_k$with$\lambda_k$. If$R_k > \lambda_k$then conclude the$k$most extreme values are outliers.
If$R_k \le \lambda_k$then compare$R_{k-1}$with$\lambda_{k-1}$. If$R_{k-1} > \lambda_{k-1}$then conclude the$k-1$most extreme values are outliers.
Continue in this fashion until a certain number of outliers have been identified or Rosner's test finds no outliers at all.

Based on a study using N=1,000 simulations, Rosner's (1983) Table 1 shows the estimated true Type I error of declaring at least one outlier when none exists for various sample sizes $n$ ranging from 10 to 100, and the declared maximum number of outliers $k$ ranging from 1 to 10. Based on that table, Roser (1983) declared that for an assumed Type I error level of 0.05, as long as $n \ge 25$, the estimated $\alpha$ levels are quite close to 0.05, and that similar results were obtained assuming a Type I error level of 0.01. However, the table below is an expanded version of Rosner's (1983) Table 1 and shows results based on N=10,000 simulations. You can see that for an assumed Type I error of 0.05, the test maintains the Type I error fairly well for sample sizes as small as $n = 3$ as long as $k = 1$, and for $n \ge 15$, as long as $k \le 2$. Also, for an assumed Type I error of 0.01, the test maintains the Type I error fairly well for sample sizes as small as $n = 15$ as long as $k \le 7$. Based on these results, when warn=TRUE, a warning is issued for the following cases indicating that the assumed Type I error may not be correct:

alphais greater than0.01, the sample size is less than 15, andkis greater than1.
alphais greater than0.01, the sample size is at least 15 and less than 25, andkis greater than2.
alphais less than or equal to0.01, the sample size is less than 15, andkis greater than1.
kis greater than10, or greater than the floor of half of the sample size (i.e., greater than the greatest integer less than or equal to half of the sample size). A warning is given for this case because simulations have not been done for this case.

Table 1a. Observed Type I Error Levels based on 10,000 Simulations, $n =$ 3 to 5. llllllll{ Assumed $\alpha=0.05$ Assumed $\alpha=0.01$ $n$ $k$ $\hat{\alpha}$ 95% LCL 95% UCL $\hat{\alpha}$ 95% LCL 95% UCL 3 1 0.047 0.043 0.051 0.009 0.007 0.01 4 1 0.049 0.045 0.053 0.010 0.008 0.012 2 0.107 0.101 0.113 0.021 0.018 0.024 5 1 0.048 0.044 0.053 0.008 0.006 0.009 2 0.095 0.090 0.101 0.020 0.018 0.023 } Table 1b. Observed Type I Error Levels based on 10,000 Simulations, $n =$ 6 to 10. llllllll{ Assumed $\alpha=0.05$ Assumed $\alpha=0.01$ $n$ $k$ $\hat{\alpha}$ 95% LCL 95% UCL $\hat{\alpha}$ 95% LCL 95% UCL 6 1 0.048 0.044 0.053 0.010 0.009 0.012 2 0.085 0.080 0.091 0.017 0.015 0.020 3 0.141 0.134 0.148 0.028 0.025 0.031 7 1 0.048 0.044 0.053 0.013 0.011 0.015 2 0.080 0.075 0.086 0.017 0.015 0.020 3 0.112 0.106 0.118 0.022 0.019 0.025 8 1 0.048 0.044 0.053 0.011 0.009 0.013 2 0.080 0.074 0.085 0.017 0.014 0.019 3 0.102 0.096 0.108 0.020 0.017 0.023 4 0.143 0.136 0.150 0.028 0.025 0.031 9 1 0.052 0.048 0.057 0.010 0.008 0.012 2 0.069 0.064 0.074 0.014 0.012 0.016 3 0.097 0.091 0.103 0.018 0.015 0.021 4 0.120 0.114 0.126 0.024 0.021 0.027 10 1 0.051 0.047 0.056 0.010 0.008 0.012 2 0.068 0.063 0.073 0.012 0.010 0.014 3 0.085 0.080 0.091 0.015 0.013 0.017 4 0.106 0.100 0.112 0.021 0.018 0.024 5 0.135 0.128 0.142 0.025 0.022 0.028 } Table 1c. Observed Type I Error Levels based on 10,000 Simulations, $n =$ 11 to 15. llllllll{ Assumed $\alpha=0.05$ Assumed $\alpha=0.01$ $n$ $k$ $\hat{\alpha}$ 95% LCL 95% UCL $\hat{\alpha}$ 95% LCL 95% UCL 11 1 0.052 0.048 0.056 0.012 0.010 0.014 2 0.070 0.065 0.075 0.014 0.012 0.017 3 0.082 0.077 0.088 0.014 0.011 0.016 4 0.101 0.095 0.107 0.019 0.016 0.021 5 0.116 0.110 0.123 0.022 0.019 0.024 12 1 0.052 0.047 0.056 0.011 0.009 0.013 2 0.067 0.062 0.072 0.011 0.009 0.013 3 0.074 0.069 0.080 0.016 0.013 0.018 4 0.088 0.082 0.093 0.016 0.014 0.019 5 0.099 0.093 0.105 0.016 0.013 0.018 6 0.117 0.111 0.123 0.021 0.018 0.023 13 1 0.048 0.044 0.052 0.010 0.008 0.012 2 0.064 0.059 0.069 0.014 0.012 0.016 3 0.070 0.065 0.075 0.013 0.011 0.015 4 0.079 0.074 0.084 0.014 0.012 0.017 5 0.088 0.083 0.094 0.015 0.013 0.018 6 0.109 0.103 0.115 0.020 0.017 0.022 14 1 0.046 0.042 0.051 0.009 0.007 0.011 2 0.062 0.057 0.066 0.012 0.010 0.014 3 0.069 0.064 0.074 0.012 0.010 0.014 4 0.077 0.072 0.082 0.015 0.013 0.018 5 0.084 0.079 0.090 0.016 0.013 0.018 6 0.091 0.085 0.097 0.017 0.014 0.019 7 0.107 0.101 0.113 0.018 0.016 0.021 15 1 0.054 0.050 0.059 0.010 0.008 0.012 2 0.057 0.053 0.062 0.010 0.008 0.012 3 0.065 0.060 0.069 0.013 0.011 0.016 4 0.073 0.068 0.078 0.014 0.011 0.016 5 0.074 0.069 0.079 0.012 0.010 0.014 6 0.086 0.081 0.092 0.015 0.013 0.017 7 0.099 0.094 0.105 0.018 0.015 0.020 } Table 1d. Observed Type I Error Levels based on 10,000 Simulations, $n =$ 16 to 20. llllllll{ Assumed $\alpha=0.05$ Assumed $\alpha=0.01$ $n$ $k$ $\hat{\alpha}$ 95% LCL 95% UCL $\hat{\alpha}$ 95% LCL 95% UCL 16 1 0.052 0.048 0.057 0.010 0.008 0.012 2 0.055 0.051 0.059 0.011 0.009 0.013 3 0.068 0.063 0.073 0.011 0.009 0.013 4 0.074 0.069 0.079 0.015 0.013 0.017 5 0.077 0.072 0.082 0.015 0.013 0.018 6 0.075 0.070 0.080 0.013 0.011 0.016 7 0.087 0.082 0.093 0.017 0.014 0.020 8 0.096 0.090 0.101 0.016 0.014 0.019 17 1 0.047 0.043 0.051 0.008 0.007 0.010 2 0.059 0.054 0.063 0.011 0.009 0.013 3 0.062 0.057 0.067 0.012 0.010 0.014 4 0.070 0.065 0.075 0.012 0.009 0.014 5 0.069 0.064 0.074 0.012 0.010 0.015 6 0.071 0.066 0.076 0.015 0.012 0.017 7 0.081 0.076 0.087 0.014 0.012 0.016 8 0.083 0.078 0.088 0.015 0.013 0.017 18 1 0.051 0.047 0.055 0.010 0.008 0.012 2 0.056 0.052 0.061 0.012 0.010 0.014 3 0.065 0.060 0.070 0.012 0.010 0.015 4 0.065 0.060 0.070 0.013 0.011 0.015 5 0.069 0.064 0.074 0.012 0.010 0.014 6 0.068 0.063 0.073 0.014 0.011 0.016 7 0.072 0.067 0.077 0.014 0.011 0.016 8 0.076 0.071 0.081 0.012 0.010 0.014 9 0.081 0.076 0.086 0.012 0.010 0.014 19 1 0.051 0.046 0.055 0.008 0.006 0.010 2 0.059 0.055 0.064 0.012 0.010 0.014 3 0.059 0.054 0.064 0.011 0.009 0.013 4 0.061 0.057 0.066 0.012 0.010 0.014 5 0.067 0.062 0.072 0.013 0.010 0.015 6 0.066 0.061 0.071 0.011 0.009 0.013 7 0.069 0.064 0.074 0.013 0.011 0.015 8 0.074 0.069 0.079 0.012 0.010 0.014 9 0.082 0.077 0.087 0.015 0.013 0.018 20 1 0.053 0.048 0.057 0.011 0.009 0.013 2 0.056 0.052 0.061 0.010 0.008 0.012 3 0.060 0.056 0.065 0.009 0.007 0.011 4 0.063 0.058 0.068 0.012 0.010 0.014 5 0.063 0.059 0.068 0.014 0.011 0.016 6 0.063 0.058 0.067 0.011 0.009 0.013 7 0.065 0.061 0.070 0.011 0.009 0.013 8 0.070 0.065 0.076 0.012 0.010 0.014 9 0.076 0.070 0.081 0.013 0.011 0.015 10 0.081 0.076 0.087 0.012 0.010 0.014 } Table 1e. Observed Type I Error Levels based on 10,000 Simulations, $n =$ 21 to 25. llllllll{ Assumed $\alpha=0.05$ Assumed $\alpha=0.01$ $n$ $k$ $\hat{\alpha}$ 95% LCL 95% UCL $\hat{\alpha}$ 95% LCL 95% UCL 21 1 0.054 0.049 0.058 0.013 0.011 0.015 2 0.054 0.049 0.058 0.012 0.010 0.014 3 0.058 0.054 0.063 0.012 0.010 0.014 4 0.058 0.054 0.063 0.011 0.009 0.013 5 0.064 0.059 0.069 0.013 0.011 0.016 6 0.066 0.061 0.071 0.012 0.010 0.015 7 0.063 0.058 0.068 0.013 0.011 0.015 8 0.066 0.061 0.071 0.010 0.008 0.012 9 0.073 0.068 0.078 0.013 0.011 0.015 10 0.071 0.066 0.076 0.012 0.010 0.014 22 1 0.047 0.042 0.051 0.010 0.008 0.012 2 0.058 0.053 0.062 0.012 0.010 0.015 3 0.056 0.052 0.061 0.010 0.008 0.012 4 0.059 0.055 0.064 0.012 0.010 0.014 5 0.061 0.057 0.066 0.009 0.008 0.011 6 0.063 0.058 0.068 0.013 0.010 0.015 7 0.065 0.060 0.070 0.013 0.010 0.015 8 0.065 0.060 0.070 0.014 0.012 0.016 9 0.065 0.060 0.070 0.012 0.010 0.014 10 0.067 0.062 0.072 0.012 0.009 0.014 23 1 0.051 0.047 0.056 0.008 0.007 0.010 2 0.056 0.052 0.061 0.010 0.009 0.012 3 0.056 0.052 0.061 0.011 0.009 0.013 4 0.062 0.057 0.066 0.011 0.009 0.013 5 0.061 0.056 0.065 0.010 0.009 0.012 6 0.060 0.055 0.064 0.012 0.010 0.014 7 0.062 0.057 0.066 0.011 0.009 0.013 8 0.063 0.058 0.068 0.012 0.010 0.014 9 0.066 0.061 0.071 0.012 0.010 0.014 10 0.068 0.063 0.073 0.014 0.012 0.017 24 1 0.051 0.046 0.055 0.010 0.008 0.012 2 0.056 0.051 0.060 0.011 0.009 0.013 3 0.058 0.053 0.062 0.010 0.008 0.012 4 0.060 0.056 0.065 0.013 0.011 0.015 5 0.057 0.053 0.062 0.012 0.010 0.014 6 0.065 0.060 0.069 0.011 0.009 0.013 7 0.062 0.057 0.066 0.012 0.010 0.014 8 0.060 0.055 0.065 0.012 0.010 0.014 9 0.066 0.061 0.071 0.012 0.010 0.014 10 0.064 0.059 0.068 0.012 0.010 0.015 25 1 0.054 0.050 0.059 0.012 0.009 0.014 2 0.055 0.051 0.060 0.010 0.008 0.012 3 0.057 0.052 0.062 0.011 0.009 0.013 4 0.055 0.051 0.060 0.011 0.009 0.013 5 0.060 0.055 0.065 0.012 0.010 0.014 6 0.060 0.055 0.064 0.011 0.009 0.013 7 0.057 0.052 0.061 0.011 0.009 0.013 8 0.062 0.058 0.067 0.011 0.009 0.013 9 0.058 0.053 0.062 0.012 0.010 0.014 10 0.061 0.057 0.066 0.010 0.008 0.012 } Table 1f. Observed Type I Error Levels based on 10,000 Simulations, $n =$ 26 to 30. llllllll{ Assumed $\alpha=0.05$ Assumed $\alpha=0.01$ $n$ $k$ $\hat{\alpha}$ 95% LCL 95% UCL $\hat{\alpha}$ 95% LCL 95% UCL 26 1 0.051 0.047 0.055 0.012 0.010 0.014 2 0.057 0.053 0.062 0.013 0.011 0.015 3 0.055 0.050 0.059 0.012 0.010 0.014 4 0.055 0.051 0.060 0.010 0.008 0.012 5 0.058 0.054 0.063 0.011 0.009 0.013 6 0.061 0.056 0.066 0.012 0.010 0.014 7 0.059 0.054 0.064 0.011 0.009 0.013 8 0.060 0.056 0.065 0.010 0.008 0.012 9 0.060 0.056 0.065 0.011 0.009 0.013 10 0.061 0.056 0.065 0.011 0.009 0.013 27 1 0.050 0.046 0.054 0.009 0.007 0.011 2 0.054 0.050 0.059 0.011 0.009 0.013 3 0.062 0.057 0.066 0.012 0.010 0.014 4 0.063 0.058 0.068 0.011 0.009 0.013 5 0.051 0.047 0.055 0.010 0.008 0.012 6 0.058 0.053 0.062 0.011 0.009 0.013 7 0.060 0.056 0.065 0.010 0.008 0.012 8 0.056 0.052 0.061 0.010 0.008 0.012 9 0.061 0.056 0.066 0.012 0.010 0.014 10 0.055 0.051 0.060 0.008 0.006 0.010 28 1 0.049 0.045 0.053 0.010 0.008 0.011 2 0.057 0.052 0.061 0.011 0.009 0.013 3 0.056 0.052 0.061 0.012 0.009 0.014 4 0.057 0.053 0.062 0.011 0.009 0.013 5 0.057 0.053 0.062 0.010 0.008 0.012 6 0.056 0.051 0.060 0.010 0.008 0.012 7 0.057 0.052 0.061 0.010 0.008 0.012 8 0.058 0.054 0.063 0.011 0.009 0.013 9 0.054 0.050 0.058 0.011 0.009 0.013 10 0.062 0.057 0.067 0.011 0.009 0.013 29 1 0.049 0.045 0.053 0.011 0.009 0.013 2 0.053 0.048 0.057 0.010 0.008 0.012 3 0.056 0.051 0.060 0.010 0.009 0.012 4 0.055 0.050 0.059 0.010 0.008 0.012 5 0.056 0.051 0.060 0.010 0.008 0.012 6 0.057 0.053 0.062 0.012 0.010 0.014 7 0.055 0.050 0.059 0.010 0.008 0.012 8 0.057 0.052 0.061 0.011 0.009 0.013 9 0.056 0.051 0.061 0.011 0.009 0.013 10 0.057 0.052 0.061 0.011 0.009 0.013 30 1 0.050 0.046 0.054 0.009 0.007 0.011 2 0.054 0.049 0.058 0.011 0.009 0.013 3 0.056 0.052 0.061 0.012 0.010 0.015 4 0.054 0.049 0.058 0.010 0.008 0.012 5 0.058 0.053 0.063 0.012 0.010 0.014 6 0.062 0.058 0.067 0.012 0.010 0.014 7 0.056 0.052 0.061 0.012 0.010 0.014 8 0.059 0.054 0.064 0.011 0.009 0.013 9 0.056 0.052 0.061 0.010 0.009 0.012 10 0.058 0.053 0.062 0.012 0.010 0.015 } Table 1g. Observed Type I Error Levels based on 10,000 Simulations, n = 31 to 35. llllllll{ Assumed $\alpha=0.05$ Assumed $\alpha=0.01$ $n$ $k$ $\hat{\alpha}$ 95% LCL 95% UCL $\hat{\alpha}$ 95% LCL 95% UCL 31 1 0.051 0.047 0.056 0.009 0.007 0.011 2 0.054 0.050 0.059 0.010 0.009 0.012 3 0.053 0.049 0.058 0.010 0.008 0.012 4 0.055 0.050 0.059 0.010 0.008 0.012 5 0.053 0.049 0.057 0.011 0.009 0.013 6 0.055 0.050 0.059 0.010 0.008 0.012 7 0.055 0.050 0.059 0.012 0.010 0.014 8 0.056 0.051 0.060 0.010 0.008 0.012 9 0.057 0.053 0.062 0.011 0.009 0.013 10 0.058 0.053 0.062 0.011 0.009 0.013 32 1 0.054 0.049 0.058 0.010 0.008 0.012 2 0.054 0.050 0.059 0.010 0.008 0.012 3 0.052 0.047 0.056 0.009 0.007 0.011 4 0.056 0.051 0.060 0.011 0.009 0.013 5 0.056 0.052 0.061 0.011 0.009 0.013 6 0.055 0.051 0.060 0.011 0.009 0.013 7 0.055 0.051 0.060 0.010 0.008 0.012 8 0.055 0.051 0.060 0.010 0.008 0.012 9 0.057 0.053 0.062 0.012 0.010 0.014 10 0.054 0.050 0.059 0.010 0.008 0.012 33 1 0.051 0.046 0.055 0.011 0.009 0.013 2 0.055 0.051 0.060 0.011 0.009 0.013 3 0.056 0.052 0.061 0.010 0.008 0.012 4 0.052 0.048 0.057 0.010 0.008 0.012 5 0.055 0.050 0.059 0.010 0.008 0.012 6 0.058 0.053 0.062 0.011 0.009 0.013 7 0.057 0.052 0.061 0.010 0.008 0.012 8 0.058 0.054 0.063 0.011 0.009 0.013 9 0.057 0.053 0.062 0.012 0.010 0.014 10 0.055 0.051 0.060 0.011 0.009 0.013 34 1 0.052 0.048 0.056 0.009 0.007 0.011 2 0.053 0.049 0.058 0.011 0.009 0.013 3 0.055 0.050 0.059 0.012 0.010 0.014 4 0.056 0.052 0.061 0.010 0.008 0.012 5 0.053 0.048 0.057 0.009 0.007 0.011 6 0.055 0.050 0.059 0.010 0.008 0.012 7 0.052 0.048 0.057 0.012 0.010 0.014 8 0.055 0.050 0.059 0.009 0.008 0.011 9 0.055 0.051 0.060 0.011 0.009 0.013 10 0.054 0.049 0.058 0.010 0.008 0.012 35 1 0.051 0.046 0.055 0.010 0.009 0.012 2 0.054 0.049 0.058 0.010 0.009 0.012 3 0.055 0.050 0.059 0.010 0.009 0.012 4 0.053 0.048 0.057 0.011 0.009 0.013 5 0.056 0.051 0.061 0.011 0.009 0.013 6 0.055 0.051 0.059 0.012 0.010 0.014 7 0.054 0.050 0.059 0.011 0.009 0.013 8 0.054 0.049 0.058 0.011 0.009 0.013 9 0.061 0.056 0.066 0.012 0.010 0.014 10 0.053 0.048 0.057 0.011 0.009 0.013 } Table 1h. Observed Type I Error Levels based on 10,000 Simulations, n = 36 to 40. llllllll{ Assumed $\alpha=0.05$ Assumed $\alpha=0.01$ $n$ $k$ $\hat{\alpha}$ 95% LCL 95% UCL $\hat{\alpha}$ 95% LCL 95% UCL 36 1 0.047 0.043 0.051 0.010 0.008 0.012 2 0.058 0.053 0.062 0.012 0.010 0.015 3 0.052 0.047 0.056 0.009 0.007 0.011 4 0.052 0.048 0.056 0.012 0.010 0.014 5 0.052 0.048 0.057 0.010 0.008 0.012 6 0.055 0.051 0.059 0.012 0.010 0.014 7 0.053 0.048 0.057 0.011 0.009 0.013 8 0.056 0.051 0.060 0.012 0.010 0.014 9 0.056 0.051 0.060 0.011 0.009 0.013 10 0.056 0.051 0.060 0.011 0.009 0.013 37 1 0.050 0.046 0.055 0.010 0.008 0.012 2 0.054 0.049 0.058 0.011 0.009 0.013 3 0.054 0.049 0.058 0.011 0.009 0.013 4 0.054 0.050 0.058 0.010 0.008 0.012 5 0.054 0.049 0.058 0.010 0.008 0.012 6 0.054 0.050 0.058 0.011 0.009 0.013 7 0.055 0.051 0.060 0.010 0.008 0.012 8 0.055 0.050 0.059 0.011 0.009 0.013 9 0.053 0.049 0.058 0.011 0.009 0.013 10 0.049 0.045 0.054 0.009 0.007 0.011 38 1 0.049 0.045 0.053 0.009 0.007 0.011 2 0.052 0.047 0.056 0.008 0.007 0.010 3 0.054 0.050 0.059 0.011 0.009 0.013 4 0.055 0.050 0.059 0.011 0.009 0.013 5 0.056 0.052 0.061 0.012 0.010 0.014 6 0.055 0.050 0.059 0.011 0.009 0.013 7 0.049 0.045 0.053 0.009 0.007 0.011 8 0.052 0.048 0.057 0.010 0.008 0.012 9 0.054 0.050 0.059 0.010 0.009 0.012 10 0.055 0.050 0.059 0.011 0.009 0.013 39 1 0.047 0.043 0.051 0.010 0.008 0.012 2 0.055 0.051 0.059 0.010 0.008 0.012 3 0.053 0.049 0.057 0.010 0.008 0.012 4 0.053 0.049 0.058 0.010 0.009 0.012 5 0.052 0.048 0.057 0.010 0.008 0.012 6 0.053 0.049 0.058 0.010 0.008 0.012 7 0.057 0.052 0.061 0.011 0.009 0.013 8 0.057 0.053 0.062 0.012 0.010 0.014 9 0.050 0.046 0.055 0.010 0.008 0.012 10 0.056 0.051 0.060 0.011 0.009 0.013 40 1 0.049 0.045 0.054 0.010 0.008 0.012 2 0.052 0.048 0.057 0.010 0.009 0.012 3 0.055 0.050 0.059 0.011 0.009 0.013 4 0.054 0.050 0.059 0.011 0.009 0.013 5 0.054 0.050 0.059 0.010 0.008 0.012 6 0.049 0.045 0.053 0.010 0.008 0.012 7 0.056 0.051 0.060 0.011 0.009 0.013 8 0.054 0.050 0.059 0.011 0.009 0.013 9 0.047 0.043 0.052 0.010 0.008 0.011 10 0.058 0.054 0.063 0.010 0.008 0.012 }

References

Barnett, V., and T. Lewis. (1995). Outliers in Statistical Data. Third Edition. John Wiley & Sons, Chichester, UK, pp. 235--236. Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, NY, pp.188--191. McBean, E.A, and F.A. Rovers. (1992). Estimation of the Probability of Exceedance of Contaminant Concentrations. Ground Water Monitoring Review Winter, pp. 115--119. McNutt, M. (2014). Raising the Bar. Science 345(6192), p. 9. Rosner, B. (1975). On the Detection of Many Outliers. Technometrics 17, 221--227. Rosner, B. (1983). Percentage Points for a Generalized ESD Many-Outlier Procedure. Technometrics 25, 165--172. USEPA. (2006). Data Quality Assessment: A Reviewer's Guide. EPA QA/G-9R. EPA/240/B-06/002, February 2006. Office of Environmental Information, U.S. Environmental Protection Agency, Washington, D.C. USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C., pp. 12-10 to 12-14. USEPA. (2013a). ProUCL Version 5.0.00 Technical Guide. EPA/600/R-07/041, September 2013. Office of Research and Development. U.S. Environmental Protection Agency, Washington, D.C., pp. 190--195. USEPA. (2013b). ProUCL Version 5.0.00 User Guide. EPA/600/R-07/041, September 2013. Office of Research and Development. U.S. Environmental Protection Agency, Washington, D.C., pp. 190--195.

Examples

Run this code

# Combine 30 observations from a normal distribution with mean 3 and 
  # standard deviation 2, with 3 observations from a normal distribution 
  # with mean 10 and standard deviation 1, then run Rosner's Test on these 
  # data, specifying k=4 potential outliers based on looking at the 
  # normal Q-Q plot. 
  # (Note: the call to set.seed simply allows you to reproduce 
  # this example.)

  set.seed(250) 

  dat <- c(rnorm(30, mean = 3, sd = 2), rnorm(3, mean = 10, sd = 1)) 

  dev.new()
  qqPlot(dat)

  rosnerTest(dat, k = 4)

  #Results of Outlier Test
  #-------------------------
  #
  #Test Method:                     Rosner's Test for Outliers
  #
  #Hypothesized Distribution:       Normal
  #
  #Data:                            dat
  #
  #Sample Size:                     33
  #
  #Test Statistics:                 R.1 = 2.848514
  #                                 R.2 = 3.086875
  #                                 R.3 = 3.033044
  #                                 R.4 = 2.380235
  #
  #Test Statistic Parameter:        k = 4
  #
  #Alternative Hypothesis:          Up to 4 observations are not
  #                                 from the same Distribution.
  #
  #Type I Error:                    5%
  #
  #Number of Outliers Detected:     3
  #
  #  i   Mean.i     SD.i      Value Obs.Num    R.i+1 lambda.i+1 Outlier
  #1 0 3.549744 2.531011 10.7593656      33 2.848514   2.951949    TRUE
  #2 1 3.324444 2.209872 10.1460427      31 3.086875   2.938048    TRUE
  #3 2 3.104392 1.856109  8.7340527      32 3.033044   2.923571    TRUE
  #4 3 2.916737 1.560335 -0.7972275      25 2.380235   2.908473   FALSE

  #----------
  # Clean up

  rm(dat)
  graphics.off()

  #--------------------------------------------------------------------

  # Example 12-4 of USEPA (2009, page 12-12) gives an example of 
  # using Rosner's test to test for outliers in napthalene measurements (ppb)
  # taken at 5 background wells over 5 quarters.  The data for this example 
  # are stored in EPA.09.Ex.12.4.naphthalene.df.

  EPA.09.Ex.12.4.naphthalene.df
  #   Quarter Well Naphthalene.ppb
  #1        1 BW.1            3.34
  #2        2 BW.1            5.39
  #3        3 BW.1            5.74
  # ...
  #23       3 BW.5            5.53
  #24       4 BW.5            4.42
  #25       5 BW.5           35.45

  longToWide(EPA.09.Ex.12.4.naphthalene.df, "Naphthalene.ppb", "Quarter", "Well", 
    paste.row.name = TRUE)
  #          BW.1 BW.2  BW.3 BW.4  BW.5
  #Quarter.1 3.34 5.59  1.91 6.12  8.64
  #Quarter.2 5.39 5.96  1.74 6.05  5.34
  #Quarter.3 5.74 1.47 23.23 5.18  5.53
  #Quarter.4 6.88 2.57  1.82 4.43  4.42
  #Quarter.5 5.85 5.39  2.02 1.00 35.45


  # Look at Q-Q plots for both the raw and log-transformed data
  #------------------------------------------------------------

  dev.new()
  with(EPA.09.Ex.12.4.naphthalene.df, 
    qqPlot(Naphthalene.ppb, add.line = TRUE, 
      main = "Figure 12-6.  Naphthalene Probability Plot"))

  dev.new()
  with(EPA.09.Ex.12.4.naphthalene.df, 
    qqPlot(Naphthalene.ppb, dist = "lnorm", add.line = TRUE, 
      main = "Figure 12-7.  Log Naphthalene Probability Plot"))


  # Test for 2 potential outliers on the original scale:
  #-----------------------------------------------------

  with(EPA.09.Ex.12.4.naphthalene.df, rosnerTest(Naphthalene.ppb, k = 2))

  #Results of Outlier Test
  #-------------------------
  #
  #Test Method:                     Rosner's Test for Outliers
  #
  #Hypothesized Distribution:       Normal
  #
  #Data:                            Naphthalene.ppb
  #
  #Sample Size:                     25
  #
  #Test Statistics:                 R.1 = 3.930957
  #                                 R.2 = 4.160223
  #
  #Test Statistic Parameter:        k = 2
  #
  #Alternative Hypothesis:          Up to 2 observations are not
  #                                 from the same Distribution.
  #
  #Type I Error:                    5%
  #
  #Number of Outliers Detected:     2
  #
  #  i  Mean.i     SD.i Value Obs.Num    R.i+1 lambda.i+1 Outlier
  #1 0 6.44240 7.379271 35.45      25 3.930957   2.821681    TRUE
  #2 1 5.23375 4.325790 23.23      13 4.160223   2.801551    TRUE

  #----------
  # Clean up

  graphics.off()

Run the code above in your browser using DataLab