Calculate the expectation-based Poisson scan statistic by supplying a
data.table
of observed counts and pre-computed expected value
parameters for each location and time. A p-value for the observed scan
statistic can be obtained by Monte Carlo simulation.
scan_poisson(table, zones, n_mcsim = 0)
A data.table
with columns
location, duration, count, mu
. The location
column should
consist of integers that are unique to each location. The
duration
column should also consist of integers, starting at 1 for
the most recent time period and increasing in reverse chronological order.
The column mu
should contain the estimated Poisson expected value
parameter.
A set
of zones, each zone itself a
set containing one or more locations of those found in table
.
A non-negative integer; the number of replicate scan statistics to generate in order to calculate a p-value.
An object of class scanstatistics
. It has the following
fields:
A data.table
containing the value of the
statistic calculated for each zone-duration combination,
for the observed data. The scan statistic is the maximum
value of these calculated statistics.
A numeric vector of length n_mcsim
containing
the values of the scanstatistics calculated by Monte
Carlo simulation.
A data.table
containing the zone, duration, and
scanstatistic.
The p-value calculated from Monte Carlo replications.
The assumed distribution of the data; "Poisson" in this case.
The type of scan statistic; "Expectation-based" in this case.
The set of zones that was passed to the function as input.
The number of locations in the data.
The number of zones.
The maximum anomaly duration considered.
For the expectation-based Poisson scan statistic, the null
hypothesis of no anomaly holds that the count observed at each location
\(i\) and duration \(t\) (the number of time periods before present)
is Poisson-distributed with expected value \(\mu_{it}\):
$$
H_0 : Y_{it} \sim \textrm{Poisson}(\mu_{it}),
$$
for all locations \(i = 1, \ldots, m\) and all durations \(t = 1,
\ldots,T\), with \(T\) being the maximum duration considered.
Under the alternative hypothesis, there is a space-time window \(W\)
consisting of a spatial zone \(Z \subset \{1, \ldots, m\}\) and a time
window \(D \subseteq \{1, \ldots, T\}\) such that the counts in that
window have their expected values inflated by a factor \(q_W > 1\)
compared to the null hypothesis:
$$
H_1 : Y_{it} \sim \textrm{Poisson}(q_W \mu_{it}), ~~(i,t) \in W.
$$
For locations and durations outside of this window, counts are assumed to
be distributed as under the null hypothesis. The sets \(Z\) considered
are those specified in the argument zones
, while the maximum
duration \(T\) is taken as the maximum value in the column
duration
of the input table
. For each space-time window
\(W\) considered, (the log of) a likelihood ratio is computed using the
distributions under the alternative and null hypotheses, and the
expectation-based Poisson scan statistic is calculated as the maximum of
these quantities over all space-time windows.
Point estimates of the parameters \(\mu_{it}\) must be specified in the
column mu
of the argument table
before this function is
called.
# NOT RUN {
# Simple example
set.seed(1)
table <- scanstatistics:::create_table(list(location = 1:4, duration = 1:4),
keys = c("location", "duration"))
table[, mu := 3 * location]
table[, count := rpois(.N, mu)]
table[location %in% c(1, 4) & duration < 3, count := rpois(.N, 2 * mu)]
zones <- scanstatistics:::powerset_zones(4)
result <- scan_poisson(table, zones, 100)
result
# }
Run the code above in your browser using DataLab