scan_poisson: Calculate the Poisson scan statistic.

Description

Calculate the expectation-based Poisson scan statistic by supplying a data.table of observed counts and pre-computed expected value parameters for each location and time. A p-value for the observed scan statistic can be obtained by Monte Carlo simulation.

Usage

scan_poisson(table, zones, n_mcsim = 0)

Arguments

table

A data.table with columns location, duration, count, mu. The location column should consist of integers that are unique to each location. The duration column should also consist of integers, starting at 1 for the most recent time period and increasing in reverse chronological order. The column mu should contain the estimated Poisson expected value parameter.

zones

A set of zones, each zone itself a set containing one or more locations of those found in table.

n_mcsim

A non-negative integer; the number of replicate scan statistics to generate in order to calculate a p-value.

Value

An object of class scanstatistics. It has the following fields:

observed: A data.table containing the value of the statistic calculated for each zone-duration combination, for the observed data. The scan statistic is the maximum value of these calculated statistics.
replicated: A numeric vector of length n_mcsim containing the values of the scanstatistics calculated by Monte Carlo simulation.
mlc: A data.table containing the zone, duration, and scanstatistic.
pvalue: The p-value calculated from Monte Carlo replications.
distribution: The assumed distribution of the data; "Poisson" in this case.
type: The type of scan statistic; "Expectation-based" in this case.
zones: The set of zones that was passed to the function as input.
n_locations: The number of locations in the data.
n_zones: The number of zones.
max_duration: The maximum anomaly duration considered.

Details

For the expectation-based Poisson scan statistic, the null hypothesis of no anomaly holds that the count observed at each location $i$ and duration $t$ (the number of time periods before present) is Poisson-distributed with expected value $\mu_{it}$: $$ H_0 : Y_{it} \sim \textrm{Poisson}(\mu_{it}), $$ for all locations $i = 1, \ldots, m$ and all durations $t = 1, \ldots,T$, with $T$ being the maximum duration considered. Under the alternative hypothesis, there is a space-time window $W$ consisting of a spatial zone $Z \subset \{1, \ldots, m\}$ and a time window $D \subseteq \{1, \ldots, T\}$ such that the counts in that window have their expected values inflated by a factor $q_W > 1$ compared to the null hypothesis: $$ H_1 : Y_{it} \sim \textrm{Poisson}(q_W \mu_{it}), ~~(i,t) \in W. $$ For locations and durations outside of this window, counts are assumed to be distributed as under the null hypothesis. The sets $Z$ considered are those specified in the argument zones, while the maximum duration $T$ is taken as the maximum value in the column duration of the input table. For each space-time window $W$ considered, (the log of) a likelihood ratio is computed using the distributions under the alternative and null hypotheses, and the expectation-based Poisson scan statistic is calculated as the maximum of these quantities over all space-time windows. Point estimates of the parameters $\mu_{it}$ must be specified in the column mu of the argument table before this function is called.

Examples

Run this code

# NOT RUN {
# Simple example
set.seed(1)
table <- scanstatistics:::create_table(list(location = 1:4, duration = 1:4), 
                                        keys = c("location", "duration"))
table[, mu := 3 * location]
table[, count := rpois(.N, mu)]
table[location %in% c(1, 4) & duration < 3, count := rpois(.N, 2 * mu)]
zones <- scanstatistics:::powerset_zones(4)
result <- scan_poisson(table, zones, 100)
result
# }

Run the code above in your browser using DataLab