The function flags observations that fall within the tolerance interval. Observations that fall outside the interval are regarded as (potential) outliers.
within_tolerance(x, w, method = c("quartile", "modified", "boxplot"),
constants, lambda = 0.05, info = FALSE,
boxplot_coef = 1.5)A vector of logicals, where TRUE indicates that an observation is within
the tolerance limits and FALSE indicates a (potential) outlier.
If info = TRUE, the function prints the tolerance interval. The
endpoints of the interval can be numbers or the symbols ‘min.’ and
‘max.’, which denote the minimum and maximum values in the data,
respectively.
[numeric vector] data vector.
[numeric vector] design weights (same length as x).
[character] one of the methods: "quartile",
"modified" (quartile method), or "boxplot".
[numeric vector] a vector of size two with
nonnegative tuning constants; it is only used by the methods
"quartile" and "modified".
[numeric] a tuning constant that takes values in the
closed unit interval; it is only used by method "modified",
default: lambda = 0.05.
[logical] if TRUE, the tolerance interval is
printed out.
[numeric] determines how far the whiskers of the
boxplot extend out from the box; the default is 1.5.
Three methods are available.
"quartile")For the quartile method, the tolerance interval is given by
$$[m - c_l \cdot L_l, \; m + c_u \cdot L_u]$$
with
$$L_l = m - q_1 \quad \text{and} \quad L_u = q_3 - m,$$
where \(m\) denotes the (weighted) median; \(q_1\) and
\(q_3\) are, respectively, the first and third (weighted)
quartiles. The tuning constants \(c_l\) and \(c_u\)
are combined into the vector \((c_l, c_u)\), which is
available as argument constants; both constants must be
nonnegative numbers.
The quartiles are calculated using design weights.
"modified")For the modified quartile method (Lee, 1995), the tolerance
interval is given by replacing \(L_l\) and \(L_u\)
with, respectively,
$$L_l = \max\big(m - q_1, \vert \lambda \cdot m\vert\big),$$
and
$$L_u = \max\big(q_3 - m, \vert \lambda \cdot m \vert\big)$$
The tuning constant \(\lambda\) can only take values in
the closed unit interval and is available as argument lambda.
The quartiles are calculated using design weights.
"boxplot")The tolerance interval for the boxplot method extends from the
lower whisker to the upper whisker. By default, the length of the
whiskers is set to 1.5 times the interquartile range; see argument
boxplot_coef. For more details, see
boxplot.
The quartiles, and therefore the interquartile range, are calculated using design weights.
Lee, H. (1995). Outliers in Business Surveys, in: Cox, B. G. et al. (eds.), Business Survey Methods, p. 503--526. New York: John Wiley and Sons.
Overview (of all implemented functions)
head(workplace)
attach(workplace)
# Show the tolerance limits
within_tolerance(payroll, weight, method = "boxplot", info = TRUE)
# Observations that fall outside the tolerance limits are (potential) outliers
outlier <- !within_tolerance(payroll, weight, method = "boxplot")
outlier[1:10]
Run the code above in your browser using DataLab