EnvStats (version 2.3.1)

# rosnerTest: Rosner's Test for Outliers

## Description

Perform Rosner's generalized extreme Studentized deviate test for up to $$k$$ potential outliers in a dataset, assuming the data without any outliers come from a normal (Gaussian) distribution.

## Usage

rosnerTest(x, k = 3, alpha = 0.05, warn = TRUE)

## Arguments

x

numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed. There must be at least 10 non-missing, finite observations in x.

k

positive integer indicating the number of suspected outliers. The argument k must be between 1 and $$n-2$$ where $$n$$ denotes the number of non-missing, finite values in the arguemnt x. The default value is k=3.

alpha

numeric scalar between 0 and 1 indicating the Type I error associated with the test of hypothesis. The default value is alpha=0.05.

warn

logical scalar indicating whether to issue a warning (warn=TRUE; the default) when the number of non-missing, finite values in x and the value of k are such that the assumed Type I error level might not be maintained. See the DETAILS section below.

## Value

A list of class "gofOutlier" containing the results of the hypothesis test. See the help file for gofOutlier.object for details.

## Details

Let $$x_1, x_2, \ldots, x_n$$ denote the $$n$$ observations. We assume that $$n-k$$ of these observations come from the same normal (Gaussian) distribution, and that the $$k$$ most “extreme” observations may or may not represent observations from a different distribution. Let $$x^{*}_1, x^{*}_2, \ldots, x^{*}_{n-i}$$ denote the $$n-i$$ observations left after omiting the $$i$$ most extreme observations, where $$i = 0, 1, \ldots, k-1$$. Let $$\bar{x}^{(i)}$$ and $$s^{(i)}$$ denote the mean and standard deviation, respectively, of the $$n-i$$ observations in the data that remain after removing the $$i$$ most extreme observations. Thus, $$\bar{x}^{(0)}$$ and $$s^{(0)}$$ denote the mean and standard deviation for the full sample, and in general $$\bar{x}^{(i)} = \frac{1}{n-i}\sum_{j=1}^{n-i} x^{*}_j \;\;\;\;\;\; (1)$$ $$s^{(i)} = \sqrt{\frac{1}{n-i-1} \sum_{j=1}^{n-i} (x^{*}_j - \bar{x}^{(i)})^2} \;\;\;\;\;\; (2)$$

For a specified value of $$i$$, the most extreme observation $$x^{(i)}$$ is the one that is the greatest distance from the mean for that data set, i.e., $$x^{(i)} = \max_{j=1,2,\ldots,n-i} |x^{*}_j - \bar{x}^{(i)}| \;\;\;\;\;\; (3)$$ Thus, an extreme observation may be the smallest or the largest one in that data set.

Rosner's test is based on the $$k$$ statistics $$R_1, R_2, \ldots, R_k$$, which represent the extreme Studentized deviates computed from successively reduced samples of size $$n, n-1, \ldots, n-k+1$$: $$R_{i+1} = \frac{|x^{(i)} - \bar{x}^{(i)}|}{s^{(i)}} \;\;\;\;\;\; (4)$$ Critical values for $$R_{i+1}$$ are denoted $$\lambda_{i+1}$$ and are computed as: $$\lambda_{i+1} = \frac{t_{p, n-i-2} (n-i-1)}{\sqrt{(n-i-2 + t_{p, n-i-2}) (n-i)}} \;\;\;\;\;\; (5)$$ where $$t_{p, \nu}$$ denotes the $$p$$'th quantile of Student's t-distribution with $$\nu$$ degrees of freedom, and in this case $$p = 1 - \frac{\alpha/2}{n - i} \;\;\;\;\;\; (6)$$ where $$\alpha$$ denotes the Type I error level.

The algorithm for determining the number of outliers is as follows:

1. Compare $$R_k$$ with $$\lambda_k$$. If $$R_k > \lambda_k$$ then conclude the $$k$$ most extreme values are outliers.

2. If $$R_k \le \lambda_k$$ then compare $$R_{k-1}$$ with $$\lambda_{k-1}$$. If $$R_{k-1} > \lambda_{k-1}$$ then conclude the $$k-1$$ most extreme values are outliers.

3. Continue in this fashion until a certain number of outliers have been identified or Rosner's test finds no outliers at all.

Based on a study using N=1,000 simulations, Rosner's (1983) Table 1 shows the estimated true Type I error of declaring at least one outlier when none exists for various sample sizes $$n$$ ranging from 10 to 100, and the declared maximum number of outliers $$k$$ ranging from 1 to 10. Based on that table, Roser (1983) declared that for an assumed Type I error level of 0.05, as long as $$n \ge 25$$, the estimated $$\alpha$$ levels are quite close to 0.05, and that similar results were obtained assuming a Type I error level of 0.01. However, the table below is an expanded version of Rosner's (1983) Table 1 and shows results based on N=10,000 simulations. You can see that for an assumed Type I error of 0.05, the test maintains the Type I error fairly well for sample sizes as small as $$n = 3$$ as long as $$k = 1$$, and for $$n \ge 15$$, as long as $$k \le 2$$. Also, for an assumed Type I error of 0.01, the test maintains the Type I error fairly well for sample sizes as small as $$n = 15$$ as long as $$k \le 7$$.

Based on these results, when warn=TRUE, a warning is issued for the following cases indicating that the assumed Type I error may not be correct:

• alpha is greater than 0.01, the sample size is less than 15, and k is greater than 1.

• alpha is greater than 0.01, the sample size is at least 15 and less than 25, and k is greater than 2.

• alpha is less than or equal to 0.01, the sample size is less than 15, and k is greater than 1.

• k is greater than 10, or greater than the floor of half of the sample size (i.e., greater than the greatest integer less than or equal to half of the sample size). A warning is given for this case because simulations have not been done for this case.

Table 1a. Observed Type I Error Levels based on 10,000 Simulations, $$n =$$ 3 to 5.

 Assumed $$\alpha=0.05$$ Assumed $$\alpha=0.01$$ $$n$$ $$k$$ $$\hat{\alpha}$$ 95% LCL 95% UCL $$\hat{\alpha}$$ 95% LCL 95% UCL 3 1 0.047 0.043 0.051 0.009 0.007 0.01 4 1 0.049 0.045 0.053 0.010 0.008 0.012 2 0.107 0.101 0.113 0.021 0.018 0.024 5 1 0.048 0.044 0.053 0.008 0.006 0.009

Table 1b. Observed Type I Error Levels based on 10,000 Simulations, $$n =$$ 6 to 10.

 Assumed $$\alpha=0.05$$ Assumed $$\alpha=0.01$$ $$n$$ $$k$$ $$\hat{\alpha}$$ 95% LCL 95% UCL $$\hat{\alpha}$$ 95% LCL 95% UCL 6 1 0.048 0.044 0.053 0.010 0.009 0.012 2 0.085 0.080 0.091 0.017 0.015 0.020 3 0.141 0.134 0.148 0.028 0.025 0.031 7 1 0.048 0.044 0.053 0.013 0.011 0.015 2 0.080 0.075 0.086 0.017 0.015 0.020 3 0.112 0.106 0.118 0.022 0.019 0.025 8 1 0.048 0.044 0.053 0.011 0.009 0.013 2 0.080 0.074 0.085 0.017 0.014 0.019 3 0.102 0.096 0.108 0.020 0.017 0.023 4 0.143 0.136 0.150 0.028 0.025 0.031 9 1 0.052 0.048 0.057 0.010 0.008 0.012 2 0.069 0.064 0.074 0.014 0.012 0.016 3 0.097 0.091 0.103 0.018 0.015 0.021 4 0.120 0.114 0.126 0.024 0.021 0.027 10 1 0.051 0.047 0.056 0.010 0.008 0.012 2 0.068 0.063 0.073 0.012 0.010 0.014 3 0.085 0.080 0.091 0.015 0.013 0.017 4 0.106 0.100 0.112 0.021 0.018 0.024

Table 1c. Observed Type I Error Levels based on 10,000 Simulations, $$n =$$ 11 to 15.

 Assumed $$\alpha=0.05$$ Assumed $$\alpha=0.01$$ $$n$$ $$k$$ $$\hat{\alpha}$$ 95% LCL 95% UCL $$\hat{\alpha}$$ 95% LCL 95% UCL 11 1 0.052 0.048 0.056 0.012 0.010 0.014 2 0.070 0.065 0.075 0.014 0.012 0.017 3 0.082 0.077 0.088 0.014 0.011 0.016 4 0.101 0.095 0.107 0.019 0.016 0.021 5 0.116 0.110 0.123 0.022 0.019 0.024 12 1 0.052 0.047 0.056 0.011 0.009 0.013 2 0.067 0.062 0.072 0.011 0.009 0.013 3 0.074 0.069 0.080 0.016 0.013 0.018 4 0.088 0.082 0.093 0.016 0.014 0.019 5 0.099 0.093 0.105 0.016 0.013 0.018 6 0.117 0.111 0.123 0.021 0.018 0.023 13 1 0.048 0.044 0.052 0.010 0.008 0.012 2 0.064 0.059 0.069 0.014 0.012 0.016 3 0.070 0.065 0.075 0.013 0.011 0.015 4 0.079 0.074 0.084 0.014 0.012 0.017 5 0.088 0.083 0.094 0.015 0.013 0.018 6 0.109 0.103 0.115 0.020 0.017 0.022 14 1 0.046 0.042 0.051 0.009 0.007 0.011 2 0.062 0.057 0.066 0.012 0.010 0.014 3 0.069 0.064 0.074 0.012 0.010 0.014 4 0.077 0.072 0.082 0.015 0.013 0.018 5 0.084 0.079 0.090 0.016 0.013 0.018 6 0.091 0.085 0.097 0.017 0.014 0.019 7 0.107 0.101 0.113 0.018 0.016 0.021 15 1 0.054 0.050 0.059 0.010 0.008 0.012 2 0.057 0.053 0.062 0.010 0.008 0.012 3 0.065 0.060 0.069 0.013 0.011 0.016 4 0.073 0.068 0.078 0.014 0.011 0.016 5 0.074 0.069 0.079 0.012 0.010 0.014 6 0.086 0.081 0.092 0.015 0.013 0.017

Table 1d. Observed Type I Error Levels based on 10,000 Simulations, $$n =$$ 16 to 20.

 Assumed $$\alpha=0.05$$ Assumed $$\alpha=0.01$$ $$n$$ $$k$$ $$\hat{\alpha}$$ 95% LCL 95% UCL $$\hat{\alpha}$$ 95% LCL 95% UCL 16 1 0.052 0.048 0.057 0.010 0.008 0.012 2 0.055 0.051 0.059 0.011 0.009 0.013 3 0.068 0.063 0.073 0.011 0.009 0.013 4 0.074 0.069 0.079 0.015 0.013 0.017 5 0.077 0.072 0.082 0.015 0.013 0.018 6 0.075 0.070 0.080 0.013 0.011 0.016 7 0.087 0.082 0.093 0.017 0.014 0.020 8 0.096 0.090 0.101 0.016 0.014 0.019 17 1 0.047 0.043 0.051 0.008 0.007 0.010 2 0.059 0.054 0.063 0.011 0.009 0.013 3 0.062 0.057 0.067 0.012 0.010 0.014 4 0.070 0.065 0.075 0.012 0.009 0.014 5 0.069 0.064 0.074 0.012 0.010 0.015 6 0.071 0.066 0.076 0.015 0.012 0.017 7 0.081 0.076 0.087 0.014 0.012 0.016 8 0.083 0.078 0.088 0.015 0.013 0.017 18 1 0.051 0.047 0.055 0.010 0.008 0.012 2 0.056 0.052 0.061 0.012 0.010 0.014 3 0.065 0.060 0.070 0.012 0.010 0.015 4 0.065 0.060 0.070 0.013 0.011 0.015 5 0.069 0.064 0.074 0.012 0.010 0.014 6 0.068 0.063 0.073 0.014 0.011 0.016 7 0.072 0.067 0.077 0.014 0.011 0.016 8 0.076 0.071 0.081 0.012 0.010 0.014 9 0.081 0.076 0.086 0.012 0.010 0.014 19 1 0.051 0.046 0.055 0.008 0.006 0.010 2 0.059 0.055 0.064 0.012 0.010 0.014 3 0.059 0.054 0.064 0.011 0.009 0.013 4 0.061 0.057 0.066 0.012 0.010 0.014 5 0.067 0.062 0.072 0.013 0.010 0.015 6 0.066 0.061 0.071 0.011 0.009 0.013 7 0.069 0.064 0.074 0.013 0.011 0.015 8 0.074 0.069 0.079 0.012 0.010 0.014 9 0.082 0.077 0.087 0.015 0.013 0.018 20 1 0.053 0.048 0.057 0.011 0.009 0.013 2 0.056 0.052 0.061 0.010 0.008 0.012 3 0.060 0.056 0.065 0.009 0.007 0.011 4 0.063 0.058 0.068 0.012 0.010 0.014 5 0.063 0.059 0.068 0.014 0.011 0.016 6 0.063 0.058 0.067 0.011 0.009 0.013 7 0.065 0.061 0.070 0.011 0.009 0.013 8 0.070 0.065 0.076 0.012 0.010 0.014 9 0.076 0.070 0.081 0.013 0.011 0.015

Table 1e. Observed Type I Error Levels based on 10,000 Simulations, $$n =$$ 21 to 25.

 Assumed $$\alpha=0.05$$ Assumed $$\alpha=0.01$$ $$n$$ $$k$$ $$\hat{\alpha}$$ 95% LCL 95% UCL $$\hat{\alpha}$$ 95% LCL 95% UCL 21 1 0.054 0.049 0.058 0.013 0.011 0.015 2 0.054 0.049 0.058 0.012 0.010 0.014 3 0.058 0.054 0.063 0.012 0.010 0.014 4 0.058 0.054 0.063 0.011 0.009 0.013 5 0.064 0.059 0.069 0.013 0.011 0.016 6 0.066 0.061 0.071 0.012 0.010 0.015 7 0.063 0.058 0.068 0.013 0.011 0.015 8 0.066 0.061 0.071 0.010 0.008 0.012 9 0.073 0.068 0.078 0.013 0.011 0.015 10 0.071 0.066 0.076 0.012 0.010 0.014 22 1 0.047 0.042 0.051 0.010 0.008 0.012 2 0.058 0.053 0.062 0.012 0.010 0.015 3 0.056 0.052 0.061 0.010 0.008 0.012 4 0.059 0.055 0.064 0.012 0.010 0.014 5 0.061 0.057 0.066 0.009 0.008 0.011 6 0.063 0.058 0.068 0.013 0.010 0.015 7 0.065 0.060 0.070 0.013 0.010 0.015 8 0.065 0.060 0.070 0.014 0.012 0.016 9 0.065 0.060 0.070 0.012 0.010 0.014 10 0.067 0.062 0.072 0.012 0.009 0.014 23 1 0.051 0.047 0.056 0.008 0.007 0.010 2 0.056 0.052 0.061 0.010 0.009 0.012 3 0.056 0.052 0.061 0.011 0.009 0.013 4 0.062 0.057 0.066 0.011 0.009 0.013 5 0.061 0.056 0.065 0.010 0.009 0.012 6 0.060 0.055 0.064 0.012 0.010 0.014 7 0.062 0.057 0.066 0.011 0.009 0.013 8 0.063 0.058 0.068 0.012 0.010 0.014 9 0.066 0.061 0.071 0.012 0.010 0.014 10 0.068 0.063 0.073 0.014 0.012 0.017 24 1 0.051 0.046 0.055 0.010 0.008 0.012 2 0.056 0.051 0.060 0.011 0.009 0.013 3 0.058 0.053 0.062 0.010 0.008 0.012 4 0.060 0.056 0.065 0.013 0.011 0.015 5 0.057 0.053 0.062 0.012 0.010 0.014 6 0.065 0.060 0.069 0.011 0.009 0.013 7 0.062 0.057 0.066 0.012 0.010 0.014 8 0.060 0.055 0.065 0.012 0.010 0.014 9 0.066 0.061 0.071 0.012 0.010 0.014 10 0.064 0.059 0.068 0.012 0.010 0.015 25 1 0.054 0.050 0.059 0.012 0.009 0.014 2 0.055 0.051 0.060 0.010 0.008 0.012 3 0.057 0.052 0.062 0.011 0.009 0.013 4 0.055 0.051 0.060 0.011 0.009 0.013 5 0.060 0.055 0.065 0.012 0.010 0.014 6 0.060 0.055 0.064 0.011 0.009 0.013 7 0.057 0.052 0.061 0.011 0.009 0.013 8 0.062 0.058 0.067 0.011 0.009 0.013 9 0.058 0.053 0.062 0.012 0.010 0.014

Table 1f. Observed Type I Error Levels based on 10,000 Simulations, $$n =$$ 26 to 30.

 Assumed $$\alpha=0.05$$ Assumed $$\alpha=0.01$$ $$n$$ $$k$$ $$\hat{\alpha}$$ 95% LCL 95% UCL $$\hat{\alpha}$$ 95% LCL 95% UCL 26 1 0.051 0.047 0.055 0.012 0.010 0.014 2 0.057 0.053 0.062 0.013 0.011 0.015 3 0.055 0.050 0.059 0.012 0.010 0.014 4 0.055 0.051 0.060 0.010 0.008 0.012 5 0.058 0.054 0.063 0.011 0.009 0.013 6 0.061 0.056 0.066 0.012 0.010 0.014 7 0.059 0.054 0.064 0.011 0.009 0.013 8 0.060 0.056 0.065 0.010 0.008 0.012 9 0.060 0.056 0.065 0.011 0.009 0.013 10 0.061 0.056 0.065 0.011 0.009 0.013 27 1 0.050 0.046 0.054 0.009 0.007 0.011 2 0.054 0.050 0.059 0.011 0.009 0.013 3 0.062 0.057 0.066 0.012 0.010 0.014 4 0.063 0.058 0.068 0.011 0.009 0.013 5 0.051 0.047 0.055 0.010 0.008 0.012 6 0.058 0.053 0.062 0.011 0.009 0.013 7 0.060 0.056 0.065 0.010 0.008 0.012 8 0.056 0.052 0.061 0.010 0.008 0.012 9 0.061 0.056 0.066 0.012 0.010 0.014 10 0.055 0.051 0.060 0.008 0.006 0.010 28 1 0.049 0.045 0.053 0.010 0.008 0.011 2 0.057 0.052 0.061 0.011 0.009 0.013 3 0.056 0.052 0.061 0.012 0.009 0.014 4 0.057 0.053 0.062 0.011 0.009 0.013 5 0.057 0.053 0.062 0.010 0.008 0.012 6 0.056 0.051 0.060 0.010 0.008 0.012 7 0.057 0.052 0.061 0.010 0.008 0.012 8 0.058 0.054 0.063 0.011 0.009 0.013 9 0.054 0.050 0.058 0.011 0.009 0.013 10 0.062 0.057 0.067 0.011 0.009 0.013 29 1 0.049 0.045 0.053 0.011 0.009 0.013 2 0.053 0.048 0.057 0.010 0.008 0.012 3 0.056 0.051 0.060 0.010 0.009 0.012 4 0.055 0.050 0.059 0.010 0.008 0.012 5 0.056 0.051 0.060 0.010 0.008 0.012 6 0.057 0.053 0.062 0.012 0.010 0.014 7 0.055 0.050 0.059 0.010 0.008 0.012 8 0.057 0.052 0.061 0.011 0.009 0.013 9 0.056 0.051 0.061 0.011 0.009 0.013 10 0.057 0.052 0.061 0.011 0.009 0.013 30 1 0.050 0.046 0.054 0.009 0.007 0.011 2 0.054 0.049 0.058 0.011 0.009 0.013 3 0.056 0.052 0.061 0.012 0.010 0.015 4 0.054 0.049 0.058 0.010 0.008 0.012 5 0.058 0.053 0.063 0.012 0.010 0.014 6 0.062 0.058 0.067 0.012 0.010 0.014 7 0.056 0.052 0.061 0.012 0.010 0.014 8 0.059 0.054 0.064 0.011 0.009 0.013 9 0.056 0.052 0.061 0.010 0.009 0.012

Table 1g. Observed Type I Error Levels based on 10,000 Simulations, n = 31 to 35.

 Assumed $$\alpha=0.05$$ Assumed $$\alpha=0.01$$ $$n$$ $$k$$ $$\hat{\alpha}$$ 95% LCL 95% UCL $$\hat{\alpha}$$ 95% LCL 95% UCL 31 1 0.051 0.047 0.056 0.009 0.007 0.011 2 0.054 0.050 0.059 0.010 0.009 0.012 3 0.053 0.049 0.058 0.010 0.008 0.012 4 0.055 0.050 0.059 0.010 0.008 0.012 5 0.053 0.049 0.057 0.011 0.009 0.013 6 0.055 0.050 0.059 0.010 0.008 0.012 7 0.055 0.050 0.059 0.012 0.010 0.014 8 0.056 0.051 0.060 0.010 0.008 0.012 9 0.057 0.053 0.062 0.011 0.009 0.013 10 0.058 0.053 0.062 0.011 0.009 0.013 32 1 0.054 0.049 0.058 0.010 0.008 0.012 2 0.054 0.050 0.059 0.010 0.008 0.012 3 0.052 0.047 0.056 0.009 0.007 0.011 4 0.056 0.051 0.060 0.011 0.009 0.013 5 0.056 0.052 0.061 0.011 0.009 0.013 6 0.055 0.051 0.060 0.011 0.009 0.013 7 0.055 0.051 0.060 0.010 0.008 0.012 8 0.055 0.051 0.060 0.010 0.008 0.012 9 0.057 0.053 0.062 0.012 0.010 0.014 10 0.054 0.050 0.059 0.010 0.008 0.012 33 1 0.051 0.046 0.055 0.011 0.009 0.013 2 0.055 0.051 0.060 0.011 0.009 0.013 3 0.056 0.052 0.061 0.010 0.008 0.012 4 0.052 0.048 0.057 0.010 0.008 0.012 5 0.055 0.050 0.059 0.010 0.008 0.012 6 0.058 0.053 0.062 0.011 0.009 0.013 7 0.057 0.052 0.061 0.010 0.008 0.012 8 0.058 0.054 0.063 0.011 0.009 0.013 9 0.057 0.053 0.062 0.012 0.010 0.014 10 0.055 0.051 0.060 0.011 0.009 0.013 34 1 0.052 0.048 0.056 0.009 0.007 0.011 2 0.053 0.049 0.058 0.011 0.009 0.013 3 0.055 0.050 0.059 0.012 0.010 0.014 4 0.056 0.052 0.061 0.010 0.008 0.012 5 0.053 0.048 0.057 0.009 0.007 0.011 6 0.055 0.050 0.059 0.010 0.008 0.012 7 0.052 0.048 0.057 0.012 0.010 0.014 8 0.055 0.050 0.059 0.009 0.008 0.011 9 0.055 0.051 0.060 0.011 0.009 0.013 10 0.054 0.049 0.058 0.010 0.008 0.012 35 1 0.051 0.046 0.055 0.010 0.009 0.012 2 0.054 0.049 0.058 0.010 0.009 0.012 3 0.055 0.050 0.059 0.010 0.009 0.012 4 0.053 0.048 0.057 0.011 0.009 0.013 5 0.056 0.051 0.061 0.011 0.009 0.013 6 0.055 0.051 0.059 0.012 0.010 0.014 7 0.054 0.050 0.059 0.011 0.009 0.013 8 0.054 0.049 0.058 0.011 0.009 0.013 9 0.061 0.056 0.066 0.012 0.010 0.014

Table 1h. Observed Type I Error Levels based on 10,000 Simulations, n = 36 to 40.

 Assumed $$\alpha=0.05$$ Assumed $$\alpha=0.01$$ $$n$$ $$k$$ $$\hat{\alpha}$$ 95% LCL 95% UCL $$\hat{\alpha}$$ 95% LCL 95% UCL 36 1 0.047 0.043 0.051 0.010 0.008 0.012 2 0.058 0.053 0.062 0.012 0.010 0.015 3 0.052 0.047 0.056 0.009 0.007 0.011 4 0.052 0.048 0.056 0.012 0.010 0.014 5 0.052 0.048 0.057 0.010 0.008 0.012 6 0.055 0.051 0.059 0.012 0.010 0.014 7 0.053 0.048 0.057 0.011 0.009 0.013 8 0.056 0.051 0.060 0.012 0.010 0.014 9 0.056 0.051 0.060 0.011 0.009 0.013 10 0.056 0.051 0.060 0.011 0.009 0.013 37 1 0.050 0.046 0.055 0.010 0.008 0.012 2 0.054 0.049 0.058 0.011 0.009 0.013 3 0.054 0.049 0.058 0.011 0.009 0.013 4 0.054 0.050 0.058 0.010 0.008 0.012 5 0.054 0.049 0.058 0.010 0.008 0.012 6 0.054 0.050 0.058 0.011

## References

Barnett, V., and T. Lewis. (1995). Outliers in Statistical Data. Third Edition. John Wiley & Sons, Chichester, UK, pp. 235--236.

Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, NY, pp.188--191.

McBean, E.A, and F.A. Rovers. (1992). Estimation of the Probability of Exceedance of Contaminant Concentrations. Ground Water Monitoring Review Winter, pp. 115--119.

McNutt, M. (2014). Raising the Bar. Science 345(6192), p. 9.

Rosner, B. (1975). On the Detection of Many Outliers. Technometrics 17, 221--227.

Rosner, B. (1983). Percentage Points for a Generalized ESD Many-Outlier Procedure. Technometrics 25, 165--172.

USEPA. (2006). Data Quality Assessment: A Reviewer's Guide. EPA QA/G-9R. EPA/240/B-06/002, February 2006. Office of Environmental Information, U.S. Environmental Protection Agency, Washington, D.C.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C., pp. 12-10 to 12-14.

USEPA. (2013a). ProUCL Version 5.0.00 Technical Guide. EPA/600/R-07/041, September 2013. Office of Research and Development. U.S. Environmental Protection Agency, Washington, D.C., pp. 190--195.

USEPA. (2013b). ProUCL Version 5.0.00 User Guide. EPA/600/R-07/041, September 2013. Office of Research and Development. U.S. Environmental Protection Agency, Washington, D.C., pp. 190--195.

gofTest, gofOutlier.object, print.gofOutlier, Normal, qqPlot.

## Examples

Run this code
# NOT RUN {
# Combine 30 observations from a normal distribution with mean 3 and
# standard deviation 2, with 3 observations from a normal distribution
# with mean 10 and standard deviation 1, then run Rosner's Test on these
# data, specifying k=4 potential outliers based on looking at the
# normal Q-Q plot.
# (Note: the call to set.seed simply allows you to reproduce
# this example.)

set.seed(250)

dat <- c(rnorm(30, mean = 3, sd = 2), rnorm(3, mean = 10, sd = 1))

dev.new()
qqPlot(dat)

rosnerTest(dat, k = 4)

#Results of Outlier Test
#-------------------------
#
#Test Method:                     Rosner's Test for Outliers
#
#Hypothesized Distribution:       Normal
#
#Data:                            dat
#
#Sample Size:                     33
#
#Test Statistics:                 R.1 = 2.848514
#                                 R.2 = 3.086875
#                                 R.3 = 3.033044
#                                 R.4 = 2.380235
#
#Test Statistic Parameter:        k = 4
#
#Alternative Hypothesis:          Up to 4 observations are not
#                                 from the same Distribution.
#
#Type I Error:                    5%
#
#Number of Outliers Detected:     3
#
#  i   Mean.i     SD.i      Value Obs.Num    R.i+1 lambda.i+1 Outlier
#1 0 3.549744 2.531011 10.7593656      33 2.848514   2.951949    TRUE
#2 1 3.324444 2.209872 10.1460427      31 3.086875   2.938048    TRUE
#3 2 3.104392 1.856109  8.7340527      32 3.033044   2.923571    TRUE
#4 3 2.916737 1.560335 -0.7972275      25 2.380235   2.908473   FALSE

#----------
# Clean up

rm(dat)
graphics.off()

#--------------------------------------------------------------------

# Example 12-4 of USEPA (2009, page 12-12) gives an example of
# using Rosner's test to test for outliers in napthalene measurements (ppb)
# taken at 5 background wells over 5 quarters.  The data for this example
# are stored in EPA.09.Ex.12.4.naphthalene.df.

EPA.09.Ex.12.4.naphthalene.df
#   Quarter Well Naphthalene.ppb
#1        1 BW.1            3.34
#2        2 BW.1            5.39
#3        3 BW.1            5.74
# ...
#23       3 BW.5            5.53
#24       4 BW.5            4.42
#25       5 BW.5           35.45

longToWide(EPA.09.Ex.12.4.naphthalene.df, "Naphthalene.ppb", "Quarter", "Well",
paste.row.name = TRUE)
#          BW.1 BW.2  BW.3 BW.4  BW.5
#Quarter.1 3.34 5.59  1.91 6.12  8.64
#Quarter.2 5.39 5.96  1.74 6.05  5.34
#Quarter.3 5.74 1.47 23.23 5.18  5.53
#Quarter.4 6.88 2.57  1.82 4.43  4.42
#Quarter.5 5.85 5.39  2.02 1.00 35.45

# Look at Q-Q plots for both the raw and log-transformed data
#------------------------------------------------------------

dev.new()
with(EPA.09.Ex.12.4.naphthalene.df,
main = "Figure 12-6.  Naphthalene Probability Plot"))

dev.new()
with(EPA.09.Ex.12.4.naphthalene.df,
qqPlot(Naphthalene.ppb, dist = "lnorm", add.line = TRUE,
main = "Figure 12-7.  Log Naphthalene Probability Plot"))

# Test for 2 potential outliers on the original scale:
#-----------------------------------------------------

with(EPA.09.Ex.12.4.naphthalene.df, rosnerTest(Naphthalene.ppb, k = 2))

#Results of Outlier Test
#-------------------------
#
#Test Method:                     Rosner's Test for Outliers
#
#Hypothesized Distribution:       Normal
#
#Data:                            Naphthalene.ppb
#
#Sample Size:                     25
#
#Test Statistics:                 R.1 = 3.930957
#                                 R.2 = 4.160223
#
#Test Statistic Parameter:        k = 2
#
#Alternative Hypothesis:          Up to 2 observations are not
#                                 from the same Distribution.
#
#Type I Error:                    5%
#
#Number of Outliers Detected:     2
#
#  i  Mean.i     SD.i Value Obs.Num    R.i+1 lambda.i+1 Outlier
#1 0 6.44240 7.379271 35.45      25 3.930957   2.821681    TRUE
#2 1 5.23375 4.325790 23.23      13 4.160223   2.801551    TRUE

#----------
# Clean up

graphics.off()
# }


Run the code above in your browser using DataCamp Workspace