toleranceBound: Upper tolerance bounds on normal quantiles

Description

The function toleranceBound computes theoretical upper tolerance bounds on the quantiles of the standard normal distribution. These can be used to produce reliable data-driven estimates of the quantiles in any normal distribution.

Usage

toleranceBound(psi, gamma, N)

Value

Returns the value of $k_{\gamma, \psi}$ with the property that the

$\psi$th quantile will be less than the estimate $X_U = \bar{X} + k_{\gamma,\psi}s$ (based on $N$ data points) at least

$100 \gamma\%$ of the time.

Arguments

psi: A real number between 0 and 1 giving the desired quantile
gamma: A real number between 0 and 1 giving the desired tolerance bound
N: An integer giving the number of observations used to estimate the quantile

Author

Kevin R. Coombes <krc@silicovore.com>

Details

Suppose that we collect $N$ observations from a normal distribution with unknown mean and variance, and wish to estimate the $95$th percentile of the distribution. A simple point estimate is given by $\tau = \bar{X} + 1.68s$. However, only the mean of the distribution is less than this value $95\%$ of the time. When $N=40$, for example, almost half of the time ($43.5\%$), fewer than $95\%$ of the observed values will be less than $\tau$. This problem is addressed by constructing a statistical tolerance interval (more precisely, a one-sided tolerance bound) that contains a given fraction, $\psi$, of the population with a given confidence level, $\gamma$ [Hahn and Meeker, 1991]. With enough samples, one can obtain distribution-free tolerance bounds [op.\ cit., Chapter 5]. For instance, one can use bootstrap or jackknife methods to estimate these bounds empirically.

Here, however, we assume that the measurements are normally distributed. We let $\bar{X}$ denote the sample mean and let $s$ denote the sample standard deviation. The upper tolerance bound that, $100 \gamma\%$ of the time, exceeds $100 \psi\%$ of $G$ values from a normal distribution is approximated by $X_U = \bar{X} + k_{\gamma,\psi}s$, where $$ k_{\gamma, \psi} = {z_{\psi} + \sqrt{z_{\psi}^2 - ab} \over a}, $$$$ a = 1-{z_{1-\gamma}^2\over 2N-2}, $$$$ b = z_{\psi}^2 - {z_{1-\gamma}^2\over N}, $$ and, for any $\pi$, $z_\pi$ is the critical value of the normal distribution that is exceeded with probability $\pi$ [Natrella, 1963].

References

Natrella, M.G. (1963) Experimental Statistics. NBS Handbook 91, National Bureau of Standards, Washington DC.

Hahn, G.J. and Meeker, W.Q. (1991) Statistical Intervals: A Guide for Practitioners. John Wiley and Sons, Inc., New York.

Examples

Run this code

N <- 50
x <- rnorm(N)
tolerance <- 0.90
quant <- 0.95
tolerance.factor <- toleranceBound(quant, tolerance, N)

# upper 90% tolerance bound for 95th percentile
tau <- mean(x) + sd(x)*tolerance.factor

# lower 90% tolerance bound for 5th percentile
rho <- mean(x) - sd(x)*tolerance.factor

# behavior of the tolerance bound as N increases
nn <- 10:100
plot(nn, toleranceBound(quant, tolerance, nn))

# behavior of the bound as the tolerance varies
xx <- seq(0.5, 0.99, by=0.01)
plot(xx, toleranceBound(quant, xx, N))

Run the code above in your browser using DataLab