boxplot.stats
Box Plot Statistics
This function is typically called by another function to gather the statistics necessary for producing box plots, but may be invoked separately.
- Keywords
- dplot
Usage
boxplot.stats(x, coef = 1.5, do.conf = TRUE, do.out = TRUE)
Arguments
- x
- a numeric vector for which the boxplot will
be constructed (
NA
s andNaN
s are allowed and omitted). - coef
- this determines how far the plot
whiskers extend out from the box. Ifcoef
is positive, the whiskers extend to the most extreme data point which is no more thancoef
times the length of the box away from the box. A value of zero causes the whiskers to extend to the data extremes (and no outliers be returned). - do.conf, do.out
- logicals; if
FALSE
, theconf
orout
component respectively will be empty in the result.
Details
The two quantile(x, c(1,3)/4)
. The hinges equal
the quartiles for odd $n$ (where n <- length(x)
) and
differ for even $n$. Whereas the quartiles only equal observations
for n %% 4 == 1
($n\equiv 1 \bmod 4$),
the hinges do so additionally for n %% 4 == 2
($n\equiv 2 \bmod 4$), and are in the middle of
two observations otherwise.
The notches (if requested) extend to +/-1.58 IQR/sqrt(n)
.
This seems to be based on the same calculations as the formula with 1.57 in
Chambers et al (1983, p.
Value
- List with named components as follows:
stats a vector of length 5, containing the extreme of the lower whisker, the lower hinge , the median, the upperhinge and the extreme of the upper whisker.n the number of non- NA
observations in the sample.conf the lower and upper extremes of the notch (if(do.conf)
). See the details.out the values of any data points which lie beyond the extremes of the whiskers ( if(do.out)
).- Note that
$stats
and$conf
are sorted in increasing order, unlike S, and that$n
and$out
include any+- Inf
values.
References
Tukey, J. W. (1977) Exploratory Data Analysis. Section 2C.
McGill, R., Tukey, J. W. and Larsen, W. A. (1978) Variations of box plots. The American Statistician 32, 12--16.
Velleman, P. F. and Hoaglin, D. C. (1981) Applications, Basics and Computing of Exploratory Data Analysis. Duxbury Press.
Emerson, J. D and Strenio, J. (1983). Boxplots and batch comparison. Chapter 3 of Understanding Robust and Exploratory Data Analysis, eds. D. C. Hoaglin, F. Mosteller and J. W. Tukey. Wiley.
Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A. (1983) Graphical Methods for Data Analysis. Wadsworth & Brooks/Cole.
See Also
Examples
library(grDevices)
require(stats)
x <- c(1:100, 1000)
(b1 <- boxplot.stats(x))
(b2 <- boxplot.stats(x, do.conf = FALSE, do.out = FALSE))
stopifnot(b1 $ stats == b2 $ stats) # do.out = FALSE is still robust
boxplot.stats(x, coef = 3, do.conf = FALSE)
## no outlier treatment:
boxplot.stats(x, coef = 0)
boxplot.stats(c(x, NA)) # slight change : n is 101
(r <- boxplot.stats(c(x, -1:1/0)))
stopifnot(r$out == c(1000, -Inf, Inf))## Difference between quartiles and hinges :
nn <- 1:17 ; n4 <- nn %% 4
hin <- sapply(sapply(nn, seq), function(x) boxplot.stats(x)$stats[c(2,4)])
q13 <- sapply(sapply(nn, seq), quantile, probs = c(1,3)/4, names = FALSE)
m <- t(rbind(q13,hin))[, c(1,3,2,4)]
dimnames(m) <- list(paste(nn), c("q1","lH", "q3","uH"))
stopifnot(m[n4 == 1, 1:2] == (nn[n4 == 1] + 3)/4, # quart. = hinge
m[n4 == 1, 3:4] == (3*nn[n4 == 1] + 1)/4,
m[,"lH"] == ( (nn+3) %/% 2) / 2,
m[,"uH"] == ((3*nn+2)%/% 2) / 2)
cm <- noquote(format(m))
cm[m[,2] == m[,1], 2] <- " = "
cm[m[,4] == m[,3], 4] <- " = "
cm