Suppose that we collect \(N\) observations from a normal distribution
with unknown mean and variance, and wish to estimate the \(95\)th
percentile of the distribution. A simple point estimate is given by
\(\tau = \bar{X} + 1.68s\). However, only the mean of the distribution is
less than this value \(95\%\) of the time. When \(N=40\), for example,
almost half of the time (\(43.5\%\)), fewer than \(95\%\) of the
observed values will be less than \(\tau\). This problem is addressed by
constructing a statistical tolerance interval (more precisely, a one-sided
tolerance bound) that contains a given fraction, \(\psi\), of the
population with a given confidence level, \(\gamma\) [Hahn and Meeker,
1991]. With enough samples, one can obtain distribution-free tolerance
bounds [op.\ cit., Chapter 5]. For instance, one can use bootstrap or
jackknife methods to estimate these bounds empirically.
Here, however, we assume that the measurements are normally distributed. We
let \(\bar{X}\) denote the sample mean and let \(s\) denote the sample
standard deviation. The upper tolerance bound that, \(100 \gamma\%\) of
the time, exceeds \(100 \psi\%\) of \(G\) values from a normal
distribution is approximated by \(X_U = \bar{X} + k_{\gamma,\psi}s\),
where
$$
k_{\gamma, \psi} = {z_{\psi} + \sqrt{z_{\psi}^2 - ab} \over a},
$$$$
a = 1-{z_{1-\gamma}^2\over 2N-2},
$$$$
b = z_{\psi}^2 - {z_{1-\gamma}^2\over N},
$$
and, for any \(\pi\), \(z_\pi\) is the critical value of the normal
distribution that is exceeded with probability \(\pi\) [Natrella, 1963].