Calculate the expectation-based zero-inflated Poisson scan statistic by
supplying a data.table
of observed counts and pre-computed expected
values and structural zero probabilities for each location and time. A
p-value for the observed scan statistic can be obtained by Monte Carlo
simulation.
scan_zip(table, zones, n_mcsim = 0, ...)
A data.table
with columns
location, duration, count, mu, p
. The location
column
should consist of integers that are unique to each location. The
duration
column should also consist of integers, starting at 1 for
the most recent time period and increasing in reverse chronological order.
The column mu
should contain the estimated Poisson expected value
parameters, and the column p
the estimated structural zero
probabilities.
A set
of zones, each zone itself a set containing one or
more locations of those found in table
.
A non-negative integer; the number of replicate scan statistics to generate in order to calculate a p-value.
Arguments passed to internal functions. Arguments that can be
passed here are d
, the initial value for the excess zero indicators
(default is 0.5), and tol
, the threshold for the absolute
convergence criterion (default is 0.01).
An object of class scanstatistics
. It has the following
fields:
A data.table
containing the value of the
statistic calculated for each zone-duration combination,
for the observed data. The scan statistic is the maximum
value of these calculated statistics.
A numeric vector of length n_mcsim
containing
the values of the scanstatistics calculated by Monte
Carlo simulation.
A data.table
containing the zone, duration, and
scanstatistic.
The p-value calculated from Monte Carlo replications.
The assumed distribution of the data; "zero-inflated Poisson" in this case.
The type of scan statistic; "Expectation-based" in this case.
The set of zones that was passed to the function as input.
The number of locations in the data.
The number of zones.
The maximum anomaly duration considered.
For the expectation-based zero-inflated Poisson scan statistic
(Kjellson 2015), the null hypothesis of no anomaly holds that the count
observed at each location \(i\) and duration \(t\) (the number of time
periods before present) has a zero-inflated Poisson distribution with
expected value parameter \(\mu_{it}\) and structural zero probability
\(p_{it}\):
$$
H_0 : Y_{it} \sim \textrm{ZIP}(\mu_{it}, p_{it}).
$$
This holds for all locations \(i = 1, \ldots, m\) and all durations
\(t = 1, \ldots,T\), with \(T\) being the maximum duration considered.
Under the alternative hypothesis, there is a space-time window \(W\)
consisting of a spatial zone \(Z \subset \{1, \ldots, m\}\) and a time
window \(D \subseteq \{1, \ldots, T\}\) such that the counts in that
window have their Poisson expected value parameters inflated by a factor
\(q_W > 1\) compared to the null hypothesis:
$$
H_1 : Y_{it} \sim \textrm{ZIP}(q_W \mu_{it}, p_{it}), ~~(i,t) \in W.
$$
For locations and durations outside of this window, counts are assumed to
be distributed as under the null hypothesis. The sets \(Z\) considered
are those specified in the argument zones
, while the maximum
duration \(T\) is taken as the maximum value in the column
duration
of the input table
.
For each space-time window \(W\) considered, (the log of) a likelihood
ratio is computed using the distributions under the alternative and null
hypotheses, and the expectation-based Poisson scan statistic is calculated
as the maximum of these quantities over all space-time windows. The
expectation-maximization (EM) algorithm is used to obtain maximum
likelihood estimates. Point estimates of the parameters \(\mu_{it}\)
must be specified in the column mu
of the argument table
before this function is called.
Kjellson, B. (2015), Spatiotemporal Outbreak Detection: A Scan Statistic Based on the Zero-Inflated Poisson Distribution, (Master Thesis, Stockholm University), Link to PDF.
# NOT RUN {
# Simple example
set.seed(1)
table <- scanstatistics:::create_table(list(location = 1:4, duration = 1:4),
keys = c("location", "duration"))
table[, mu := 3 * location]
table[, p := runif(.N, 0, 0.3)]
table[, count := gamlss.dist::rZIP(.N, mu = mu, sigma = p)]
table[location %in% c(1, 4) & duration < 3,
count := gamlss.dist::rZIP(.N, mu = 2 * mu, sigma = p)]
zones <- scanstatistics:::powerset_zones(4)
result <- scan_poisson(table, zones, 100)
result
# }
Run the code above in your browser using DataLab