
predIntNormSimultaneousK
is called by predIntNormSimultaneous
.predIntNormSimultaneousK(n, df = n - 1, n.mean = 1, k = 1, m = 2, r = 1,
rule = "k.of.m", delta.over.sigma = 0, pi.type = "upper", conf.level = 0.95,
K.tol = .Machine$double.eps^0.5, integrate.args.list = NULL)
df=n-1
.n.mean=1
(i.e., individual observations). Note that all
future averages must be based on the same sample size.rule="k.of.m"
), a positive integer
specifying the minimum number of observations (or averages) out of $m$
observations (or averages) (all obtained on one future sampling m=2
, except when rule="Modified.CA"
, in which
case this argumr=1
."k.of.m"
($k$-of-$m$ rule; the default), "CA"
(California rule),
and "Modified.CA"
(modified California rule).
See the DETAILS sectpi.type="upper"
(the default), and
pi.type="lower"
.conf.level=0.95
.K.tol=.Machine$double.eps^(1/2)
.
For many applications, the value of $K$ needs to be known only to the second
integrate
function. The
default value is integrate.args.list=NULL
which means that the
default values of
predIntNorm
computes a standard prediction
interval based on a sample from a normal distribution.
The function predIntNormSimultaneous
computes a simultaneous prediction
interval that will contain a certain number of future observations with probability
$(1-\alpha)100%$ for each of $r$ future sampling predIntNormSimultaneous
computes a simultaneous prediction
interval based on one of three possible rules:
rule="k.of.m"
), at least$k$of
the next$m$future observations will fall in the prediction
interval with probability$(1-\alpha)100%$on each of the$r$future
sampling occasions. If obserations are being taken sequentially, for a particular
sampling occasion, up to$m$observations may be taken, but once$k$of the observations fall within the prediction interval, sampling can stop.
Note: When$k=m$and$r=1$, the results ofpredIntNormSimultaneous
are equivalent to the results ofpredIntNorm
.rule="CA"
), with probability$(1-\alpha)100%$, for each of the$r$future sampling occasions, either
the first observation will fall in the prediction interval, or else all of the next$m-1$observations will fall in the prediction interval. That is, if the first
observation falls in the prediction interval then sampling can stop. Otherwise,$m-1$more observations must be taken.rule="Modified.CA"
), with probability$(1-\alpha)100%$, for each of the$r$future sampling occasions, either the
first observation will fall in the prediction interval, or else at least 2 out of
the next 3 observations will fall in the prediction interval. That is, if the first
observation falls in the prediction interval then sampling can stop. Otherwise, up
to 3 more observations must be taken.predIntNormSimultaneous
and predIntNormSimultaneousK
,
the argument n.mean
corresponds to $w$.
The Form of a Prediction Interval
Let $\underline{x} = x_1, x_2, \ldots, x_n$ denote a vector of $n$
observations from a normal distribution with parameters
mean=
$\mu$ and sd=
$\sigma$. Also, let $w$ denote the
sample size associated with the future averages (i.e., n.mean=
$w$).
When $w=1$, each average is really just a single observation, so in the rest of
this help file the term tolIntNorm
).
Similarly, the form of a one-sided lower prediction interval is:
pi.type="lower"
) and upper (pi.type="upper"
) prediction
intervals are available.
The derivation of the constant $K$ is explained below.
The Derivation of K for Future Observations
First we will show the derivation based on future observations (i.e.,
$w=1$, n.mean=1
), and then extend the formulas to future averages.
The Derivation of K for the k-of-m Rule (rule="k.of.m"
)
For the $k$-of-$m$ rule (rule="k.of.m"
) with $w=1$
(i.e., n.mean=1
), at least $k$ of the next $m$ future
observations will fall in the prediction interval
with probability $(1-\alpha)100%$ on each of the $r$ future sampling
occasions. If observations are being taken sequentially, for a particular
sampling occasion, up to $m$ observations may be taken, but once $k$
of the observations fall within the prediction interval, sampling can stop.
Note: When $k=m$ and $r=1$, this kind of simultaneous prediction
interval becomes the same as a standard prediction interval for the next
$k$ observations (see predIntNorm
).
For the case when $r=1$ future sampling occasion, both Hall and Prairie (1973)
and Fertig and Mann (1977) discuss the derivation of $K$. Davis and McNichols
(1987) extend the derivation to the case where $r$ is a positive integer. They
show that for a one-sided upper prediction interval (pi.type="upper"
), the
probability $p$ that at least $k$ of the next $m$ future observations
will be contained in the interval given in Equation (5) above, for each of $r$
future sampling occasions, is given by:
df=
$\nu$ and ncp=
$\delta$ evaluated at $x$;
$\Phi(x)$ denotes the cdf of the standard normal distribution
evaluated at $x$; $I(x; \nu, \omega)$ denotes the cdf of the
beta distribution with parameters shape1=
$\nu$ and
shape2=
$\omega$; and $B(\nu, \omega)$ denotes the value of the
beta function with parameters a=
$\nu$ and
b=
$\omega$.
The quantity $\Delta$ (upper case delta) denotes the difference between the
mean of the population that was sampled to construct the prediction interval, and
the mean of the population that will be sampled to produce the future observations.
The quantity $\sigma$ (sigma) denotes the population standard deviation of both
of these populations. Usually you assume $\Delta=0$ unless you are interested
in computing the power of the rule to detect a change in means between the
populations (see predIntNormSimultaneousTestPower
).
For given values of the confidence level ($p$), sample size ($n$),
minimum number of future observations to be contained in the interval per
sampling occasion ($k$), number of future observations per sampling occasion
($m$), and number of future sampling occasions ($r$), Equation (6) can
be solved for $K$. The function predIntNormSimultaneousK
uses the
Rfunction nlminb
to solve Equation (6) for $K$.
When pi.type="lower"
, the same value of $K$ is used as when
pi.type="upper"
, but Equation (4) is used to construct the prediction
interval.
The Derivation of K for the California Rule (rule="CA"
)
For the California rule (rule="CA"
), with probability $(1-\alpha)100%$,
for each of the $r$ future sampling occasions, either the first observation will
fall in the prediction interval, or else all of the next $m-1$ observations will
fall in the prediction interval. That is, if the first observation falls in the
prediction interval then sampling can stop. Otherwise, $m-1$ more observations
must be taken.
The formula for $K$ is the same as for the $k$-of-$m$ rule, except that
Equation (6) becomes the following (Davis, 1998b):
rule="Modified.CA"
)
For the Modified California rule (rule="Modified.CA"
), with probability
$(1-\alpha)100%$, for each of the $r$ future sampling occasions, either the
first observation will fall in the prediction interval, or else at least 2 out of
the next 3 observations will fall in the prediction interval. That is, if the first
observation falls in the prediction interval then sampling can stop. Otherwise, up
to 3 more observations must be taken.
The formula for $K$ is the same as for the $k$-of-$m$ rule, except that
Equation (6) becomes the following (Davis, 1998b):
n.mean
$\ge 1$), the first
term in the integral in Equations (6)-(8) that involves the cdf of the
non-central Student's t-distribution becomes:
predIntNormSimultaneous
,
predIntNormSimultaneousTestPower
,
predIntNorm
, predIntNormK
,
predIntLnormSimultaneous
, tolIntNorm
,
Normal, estimate.object
, enorm
# Compute the value of K for an upper 95% simultaneous prediction
# interval to contain at least 1 out of the next 3 observations
# given a background sample size of n=8.
predIntNormSimultaneousK(n = 8, k = 1, m = 3)
#[1] 0.5123091
#----------
# Compare the value of K for a 95% 1-of-3 upper prediction interval to
# the value for the California and Modified California rules.
# Note that the value of K for the Modified California rule is between
# the value of K for the 1-of-3 rule and the California rule.
predIntNormSimultaneousK(n = 8, k = 1, m = 3)
#[1] 0.5123091
predIntNormSimultaneousK(n = 8, m = 3, rule = "CA")
#[1] 1.252077
predIntNormSimultaneousK(n = 8, rule = "Modified.CA")
#[1] 0.8380233
#----------
# Show how the value of K for an upper 95% simultaneous prediction
# limit increases as the number of future sampling occasions r increases.
# Here, we'll use the 1-of-3 rule.
predIntNormSimultaneousK(n = 8, k = 1, m = 3)
#[1] 0.5123091
predIntNormSimultaneousK(n = 8, k = 1, m = 3, r = 10)
#[1] 1.363002
#==========
# Example 19-1 of USEPA (2009, p. 19-17) shows how to compute an
# upper simultaneous prediction limit for the 1-of-3 rule for
# r = 2 future sampling occasions. The data for this example are
# stored in EPA.09.Ex.19.1.sulfate.df.
# We will pool data from 4 background wells that were sampled on
# a number of different occasions, giving us a sample size of
# n = 25 to use to construct the prediction limit.
# There are 50 compliance wells and we will monitor 10 different
# constituents at each well at each of the r=2 future sampling
# occasions. To determine the confidence level we require for
# the simultaneous prediction interval, USEPA (2009) recommends
# setting the individual Type I Error level at each well to
# 1 - (1 - SWFPR)^(1 / (Number of Constituents * Number of Wells))
# which translates to setting the confidence limit to
# (1 - SWFPR)^(1 / (Number of Constituents * Number of Wells))
# where SWFPR = site-wide false positive rate. For this example, we
# will set SWFPR = 0.1. Thus, the confidence level is given by:
nc <- 10
nw <- 50
SWFPR <- 0.1
conf.level <- (1 - SWFPR)^(1 / (nc * nw))
conf.level
#[1] 0.9997893
#----------
# Compute the value of K for the upper simultaneous prediction
# limit for the 1-of-3 plan.
predIntNormSimultaneousK(n = 25, k = 1, m = 3, r = 2,
rule = "k.of.m", pi.type = "upper", conf.level = conf.level)
#[1] 2.014365
#==========
# Cleanup
#--------
rm(nc, nw, SWFPR, conf.level)
Run the code above in your browser using DataLab