Learn R Programming

robsurvey (version 0.7-3)

within_tolerance: Tolerance Interval

Description

The function flags observations that fall within the tolerance interval. Observations that fall outside the interval are regarded as (potential) outliers.

Usage

within_tolerance(x, w, method = c("quartile", "modified", "boxplot"),
                 constants, lambda = 0.05, info = FALSE,
                 boxplot_coef = 1.5)

Value

A vector of logicals, where TRUE indicates that an observation is within the tolerance limits and FALSE indicates a (potential) outlier.

If info = TRUE, the function prints the tolerance interval. The endpoints of the interval can be numbers or the symbols ‘min.’ and ‘max.’, which denote the minimum and maximum values in the data, respectively.

Arguments

x

[numeric vector] data vector.

w

[numeric vector] design weights (same length as x).

method

[character] one of the methods: "quartile", "modified" (quartile method), or "boxplot".

constants

[numeric vector] a vector of size two with nonnegative tuning constants; it is only used by the methods "quartile" and "modified".

lambda

[numeric] a tuning constant that takes values in the closed unit interval; it is only used by method "modified", default: lambda = 0.05.

info

[logical] if TRUE, the tolerance interval is printed out.

boxplot_coef

[numeric] determines how far the whiskers of the boxplot extend out from the box; the default is 1.5.

Details

Three methods are available.

Quartile method ("quartile")

For the quartile method, the tolerance interval is given by $$[m - c_l \cdot L_l, \; m + c_u \cdot L_u]$$ with $$L_l = m - q_1 \quad \text{and} \quad L_u = q_3 - m,$$ where \(m\) denotes the (weighted) median; \(q_1\) and \(q_3\) are, respectively, the first and third (weighted) quartiles. The tuning constants \(c_l\) and \(c_u\) are combined into the vector \((c_l, c_u)\), which is available as argument constants; both constants must be nonnegative numbers.

The quartiles are calculated using design weights.

Modified quartile method ("modified")

For the modified quartile method (Lee, 1995), the tolerance interval is given by replacing \(L_l\) and \(L_u\) with, respectively, $$L_l = \max\big(m - q_1, \vert \lambda \cdot m\vert\big),$$ and $$L_u = \max\big(q_3 - m, \vert \lambda \cdot m \vert\big)$$ The tuning constant \(\lambda\) can only take values in the closed unit interval and is available as argument lambda.

The quartiles are calculated using design weights.

Boxplot (box-and-whisker plot) method ("boxplot")

The tolerance interval for the boxplot method extends from the lower whisker to the upper whisker. By default, the length of the whiskers is set to 1.5 times the interquartile range; see argument boxplot_coef. For more details, see boxplot.

The quartiles, and therefore the interquartile range, are calculated using design weights.

References

Lee, H. (1995). Outliers in Business Surveys, in: Cox, B. G. et al. (eds.), Business Survey Methods, p. 503--526. New York: John Wiley and Sons.

See Also

Overview (of all implemented functions)

Examples

Run this code
head(workplace)
attach(workplace)

# Show the tolerance limits
within_tolerance(payroll, weight, method = "boxplot", info = TRUE)

# Observations that fall outside the tolerance limits are (potential) outliers
outlier <- !within_tolerance(payroll, weight, method = "boxplot")
outlier[1:10]

Run the code above in your browser using DataLab