MGBT (version 1.0.4)

BLlo: Barnett and Lewis Test Adjusted for Low Outliers

Description

The Barnett and Lewis (1995, p. 224; \(T_{\mathrm{N}3}\)) so-labeled “N3 method” with TAC adjustment to look for low outliers. The essence of the method, given the order statistics \(x_{[1:n]} \le x_{[2:n]} \le \cdots \le x_{[(n-1):n]} \le x_{[n:n]}\), is the statistic $$BL_r = T_{\mathrm{N}3} = \frac{ \sum_{i=1}^r x_{[i:n]} - r \times \mathrm{mean}\{x_{[1:n]}\} } {\sqrt{\mathrm{var}\{x_{[1:n]}\}}}\mbox{,}$$ for the mean and variance of the observations. Barnett and Lewis (1995, p. 218) brand this statistic as a test of the “\(k \ge 2\) upper outliers” but for the MGBT package “lower” applies in TAC reformulation. Barnett and Lewis (1995, p. 218) show an example of a modification for two low outliers as \((2\overline{x} - x_{[2:n]} - x_{[1:n]})/s\) for the mean \(\mu\) and standard deviation \(s\). TAC reformulation thus differs by a sign. The \(BL_r\) is a sum of internally studentized deviations from the mean: $$SP(t) \le {n \choose k} P\biggl(\bm{t}(n-2) > \biggr[\frac{n(n-2)t^2}{r(n-r)(n-1)-nt^2}\biggl]^{1/2}\biggr)\mbox{,}$$ where \(\bm{t}(df)\) is the t-distribution for \(df\) degrees of freedom, and this is an inequality when $$t \ge \sqrt{r^2(n-1)(n-r-1)/(nr+n)}\mbox{,}$$ where \(SP(t)\) is the probability that \(T_{\mathrm{N}3} > t\) when the inequality holds. For reference, Barnett and Lewis (1995, p. 491) example tables of critical values for \(n=10\) for \(k \in 2,3,4\) at 5-percent significant level are \(3.18\), \(3.82\), and \(4.17\), respectively. One of these is evaluated in the Examples.

Usage

BLlo(x, r, n=length(x))

Arguments

x

The data values and note that base-10 logarithms of these are not computed internally;

r

The number of truncated observations; and

n

The number of observations.

Value

The value for \(BL_r\).

References

Barnett, Vic, and Lewis, Toby, 1995, Outliers in statistical data: Chichester, John Wiley and Sons, ISBN~0--471--93094--6.

Cohn, T.A., 2013--2016, Personal communication of original R source code: U.S. Geological Survey, Reston, Va.

See Also

MGBTcohn2011, RSlo

Examples

Run this code
# NOT RUN {
# See Examples under RSlo()

# }
# NOT RUN {
 # WHA experiments with BL_r()
n <- 10; r <- 3; nsim <- 10000; alpha <- 0.05; Tcrit <- 3.82
BLs <- Ho <- RHS <- SPt <- rep(NA, nsim)
EQ <- sqrt(r^2*(n-1)*(n-r-1)/(n*r+n))
for(i in 1:nsim) { # some simulation results shown below
   BLs[i] <- abs(BLlo(rnorm(n), r)) # abs() correcting TAC sign convention
   t  <- sqrt( (n*(n-2)*BLs[i]^2) / (r*(n-r)*(n-1)-n*BLs[i]^2) )
   RHS[i] <- choose(n,r)*pt(t, n-2, lower.tail=FALSE)
   ifelse(t >= EQ, SPt[i] <- RHS[i], SPt[i] <- 1) # set SP(t) to unity?
   Ho[i]  <- BLs[i] > Tcrit
}
results <- c(quantile(BLs, prob=1-alpha), sum(Ho /nsim), sum(SPt < alpha)/nsim)
names(results) <- c("Critical_value", "Ho_rejected", "Coverage_SP(t)")
print(results) # minor differences are because of random number seeding
# Critical_value    Ho_rejected Coverage_SP(t)
#      3.817236       0.048200       0.050100 
# }

Run the code above in your browser using DataCamp Workspace