
Last chance! 50% off unlimited learning
Sale ends in
Poisson distribution
, and
construct a prediction interval for the next $k$ observations or
next set of $k$ sums.predIntPois(x, k = 1, n.sum = 1, method = "conditional",
pi.type = "two-sided", conf.level = 0.95, round.limits = TRUE)
epois
or epoisCensor
conf.level
.
The default value is k=1
.n.sum=1
(i.e., individual observations).
Note that all future sums must be based on the same sample size."conditional"
(based on a conditional distribution; the default),
"conditional.approx.normal"
(method based on approximating a conditional
distripi.type="two-sided"
(the default),
pi.type="lower"
, and pi.type="upper"
.conf.level=0.95
.round.limits=TRUE
.x
is a numeric vector, predIntPois
returns a list of class
"estimate"
containing the estimated parameter, the prediction interval,
and other information. See the help file for
estimate.object
for details.
If x
is the result of calling an estimation function,
predIntPois
returns a list whose class is the same as x
.
The list contains the same components as x
, as well as a component called
interval
containing the prediction interval information.
If x
already has a component called interval
, this component is
replaced with the prediction interval information.size=
$N$ and prob=
$(1-\alpha)100%$ (see Binomial).
If, on the other hand, only one baseline sample is taken and only one prediction
interval for $k=1$ future observation is computed, then the number of
future observations out of a total of $N$ future observations that will be
contained in that one prediction interval is a binomial random variable with
parameters size=
$N$ and prob=
$(1-\alpha^*)100%$, where
$\alpha^*$ depends on the true population parameters and the computed
bounds of the prediction interval.
Because of the discrete nature of the Poisson distribution,
even if the true mean of the distribution $\lambda$ were known exactly, the
actual confidence level associated with a prediction limit will usually not be exactly equal to
$(1-\alpha)100%$. For example, for the Poisson distribution with parameter
lambda=2
, the interval [0, 4] contains 94.7% of this distribution and
the interval [0,5] contains 98.3% of this distribution. Thus, no interval can
contain exactly 95% of this distribution, so it is impossible to construct an
exact 95% prediction interval for the next $k=1$ observation for a
Poisson distribution with parameter lambda=2
.
The Form of a Poisson Prediction Interval
Let $\underline{x} = x_1, x_2, \ldots, x_n$ denote a vector of $n$
observations from a Poisson distribution with parameter
lambda=
$\lambda$. Also, let $X$ denote the sum of these
$n$ random variables, i.e.,
n.sum=
$m$). When $m=1$, each sum is really just a
single observation, so in the rest of this help file the term lambda=
$\lambda^{*}$, and set $Y$ equal to the sum of these
$m$ random variables, i.e.,
lambda=
$m\lambda^{*}$ (Johnson et al., 1992, p.160). We are interested
in constructing a prediction limit for the next value of $Y$, or else the next
$k$ sums of $m$ Poisson random variables, based on the observed value of
$X$ and assuming $\lambda^{*} = \lambda$.
For a Poisson distribution, the form of a two-sided prediction interval is:
predIntNorm
).
Similarly, the form of a one-sided lower prediction interval is:
method="conditional"
)
Nelson (1970) derives a prediction interval for the case $k=1$ based on the
conditional distribution of $Y$ given $X+Y$. He notes that the conditional
distribution of $Y$ given the quantity $X+Y=w$ is
binomial with parameters
size=
$w$ and prob=
$[m\lambda^{*} / (m\lambda^{*} + n\lambda)]$
(Johnson et al., 1992, p.161). When $k=1$, the prediction limits are computed
as those most extreme values of $Y$ that still yield a non-significant test of
the hypothesis $H_0: \lambda^{*} = \lambda$, which for the conditional
distribution of $Y$ is equivalent to the hypothesis
$H_0$: prob=[m /(m + n)]
.
Using the relationship between the binomial and
F-distribution (see the explanation of exact confidence
intervals in the help file for ebinom
), Nelson (1982, p. 203) states
that exact two-sided $(1-\alpha)100%$ prediction limits [LPL, UPL] are the
closest integer solutions to the following equations:
ci.type="lower"
, $\alpha/2$ is replaced with $\alpha$ in
Equation (8) above for $LPL$, and $UPL$ is set to $\infty$.
If ci.type="upper"
, $\alpha/2$ is replaced with $\alpha$ in
Equation (9) above for $UPL$, and $LPL$ is set to 0.
NOTE: This method is not extended to the case $k > 1$.
Conditional Distribution Approximation Based on Normal Distribution
(method="conditional.approx.normal"
)
Cox and Hinkley (1974, p.245) derive an approximate prediction interval for the case
$k=1$. Like Nelson (1970), they note that the conditional distribution of
$Y$ given the quantity $X+Y=w$ is binomial with
parameters size=
$w$ and
prob=
$[m\lambda^{*} / (m\lambda^{*} + n\lambda)]$, and that the
hypothesis $H_0: \lambda^{*} = \lambda$ is equivalent to the hypothesis
$H_0$: prob=[m /(m + n)]
.
Cox and Hinkley (1974, p.245) suggest using the normal approximation to the
binomial distribution (in this case, without the continuity correction;
see Zar, 2010, pp.534-536 for information on the continuity correction associated
with the normal approximation to the binomial distribution). Under the null
hypothesis $H_0: \lambda^{*} = \lambda$, the quantity
pi.type="two-sided"
, the prediction limits are computed
by solving the equation
pi.type="lower"
or pi.type="upper"
, $K$ is computed exactly
as above, except $t$ is set to $t = z_{1-\alpha}$.
The Case When k > 1
When $k > 1$, Gibbons (1987b) suggests using the Bonferroni inequality.
That is, the value of $K$ is computed exactly as for the case $k=1$
described above, except that the Bonferroni value of $t$ is used in place of the
usual value of $t$:
When pi.type="two-side"
, $t = z_{1 - (\alpha/k)/2}$.
When pi.type="lower"
or pi.type="upper"
, $t = z_{1 - \alpha/k}$.
Conditional Distribution Approximation Based on Student's t-Distribution
(method="conditional.approx.t"
)
When method="conditional.approx.t"
, the exact same procedure is used as when
method="conditional.approx.normal"
, except that the quantity in
Equation (10) is assumed to follow a Student's t-distribution with $n-1$
degrees of freedom. Thus, all occurrences of $z_p$ are replaced with
$t_{n-1, p}$, where $t_{\nu, p}$ denotes the $p$'th quantile of
Student's t-distribution with $\nu$ degrees of freedom.
Normal Approximation (method="normal.approx"
)
The normal approximation for Poisson prediction limits was given by
Nelson (1970; 1982, p.203) and is based on the fact that the mean and variance of a
Poisson distribution are the same (Johnson et al, 1992, p.157), and for
predIntPois
, however, assumes this quantity is distributed as approximately
a Student's t-distribution with $n-1$ degrees of freedom.
Thus, following the idea of prediction intervals for a normal distribution
(see predIntNorm
), when pi.type="two-sided"
, the constant
$K$ for a $(1-\alpha)100%$ prediction interval for the next $k=1$ sum
of $m$ observations is computed as:
pi.type="lower"
or pi.type="upper"
, the constant
$K$ is computed as:
pi.type="two-sided"
,
pi.type="lower"
or pi.type="upper"
,
Poisson
, epois
,
estimate.object
, Prediction Intervals,
tolIntPois
, Estimating Distribution Parameters.# Generate 20 observations from a Poisson distribution with parameter
# lambda=2. The interval [0, 4] contains 94.7% of this distribution and
# the interval [0,5] contains 98.3% of this distribution. Thus, because
# of the discrete nature of the Poisson distribution, no interval contains
# exactly 95% of this distribution. Use predIntPois to estimate the mean
# parameter of the true distribution, and construct a one-sided upper
# 95% prediction interval for the next single observation from this distribution.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(250)
dat <- rpois(20, lambda = 2)
predIntPois(dat, pi.type = "upper")
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Poisson
#
#Estimated Parameter(s): lambda = 1.8
#
#Estimation Method: mle/mme/mvue
#
#Data: dat
#
#Sample Size: 20
#
#Prediction Interval Method: conditional
#
#Prediction Interval Type: upper
#
#Confidence Level: 95%
#
#Number of Future Observations: 1
#
#Prediction Interval: LPL = 0
# UPL = 5
#----------
# Compare results above with the other approximation methods:
predIntPois(dat, method = "conditional.approx.normal",
pi.type = "upper")$interval$limits
#LPL UPL
# 0 4
predIntPois(dat, method = "conditional.approx.t",
pi.type = "upper")$interval$limits
#LPL UPL
# 0 4
predIntPois(dat, method = "normal.approx",
pi.type = "upper")$interval$limits
#LPL UPL
# 0 4
#Warning message:
#In predIntPois(dat, method = "normal.approx", pi.type = "upper") :
# Estimated value of 'lambda' and/or number of future observations
# is/are probably too small for the normal approximation to work well.
#==========
# Using the same data as in the previous example, compute a one-sided
# upper 95% prediction limit for k=10 future observations.
# Using conditional approximation method based on the normal distribution.
predIntPois(dat, k = 10, method = "conditional.approx.normal",
pi.type = "upper")
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Poisson
#
#Estimated Parameter(s): lambda = 1.8
#
#Estimation Method: mle/mme/mvue
#
#Data: dat
#
#Sample Size: 20
#
#Prediction Interval Method: conditional.approx.normal
#
#Prediction Interval Type: upper
#
#Confidence Level: 95%
#
#Number of Future Observations: 10
#
#Prediction Interval: LPL = 0
# UPL = 6
# Using method based on approximating conditional distribution with
# Student's t-distribution
predIntPois(dat, k = 10, method = "conditional.approx.t",
pi.type = "upper")$interval$limits
#LPL UPL
# 0 6
#==========
# Repeat the above example, but set k=5 and n.sum=3. Thus, we want a
# 95% upper prediction limit for the next 5 sets of sums of 3 observations.
predIntPois(dat, k = 5, n.sum = 3, method = "conditional.approx.t",
pi.type = "upper")$interval$limits
#LPL UPL
# 0 12
#==========
# Reproduce Example 3.6 in Gibbons et al. (2009, p. 75)
# A 32-constituent VOC scan was performed for n=16 upgradient
# samples and there were 5 detections out of these 16. We
# want to construct a one-sided upper 95% prediction limit
# for 20 monitoring wells (so k=20 future observations) based
# on these data.
# First we need to create a data set that will yield a mean
# of 5/16 based on a sample size of 16. Any number of data
# sets will do. Here are two possible ones:
dat <- c(rep(1, 5), rep(0, 11))
dat <- c(2, rep(1, 3), rep(0, 12))
# Now call predIntPois. Don't round the limits so we can
# compare to the example in Gibbons et al. (2009).
predIntPois(dat, k = 20, method = "conditional.approx.t",
pi.type = "upper", round.limits = FALSE)
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Poisson
#
#Estimated Parameter(s): lambda = 0.3125
#
#Estimation Method: mle/mme/mvue
#
#Data: dat
#
#Sample Size: 16
#
#Prediction Interval Method: conditional.approx.t
#
#Prediction Interval Type: upper
#
#Confidence Level: 95%
#
#Number of Future Observations: 20
#
#Prediction Interval: LPL = 0.000000
# UPL = 2.573258
#==========
# Cleanup
#--------
rm(dat)
Run the code above in your browser using DataLab