Learn R Programming

scanstatistics (version 0.1.0)

scan_poisson: Calculate the Poisson scan statistic.

Description

Calculate the expectation-based Poisson scan statistic by supplying a data.table of observed counts and pre-computed expected value parameters for each location and time. A p-value for the observed scan statistic can be obtained by Monte Carlo simulation.

Usage

scan_poisson(table, zones, n_mcsim = 0)

Arguments

table

A data.table with columns location, duration, count, mu. The location column should consist of integers that are unique to each location. The duration column should also consist of integers, starting at 1 for the most recent time period and increasing in reverse chronological order. The column mu should contain the estimated Poisson expected value parameter.

zones

A set of zones, each zone itself a set containing one or more locations of those found in table.

n_mcsim

A non-negative integer; the number of replicate scan statistics to generate in order to calculate a p-value.

Value

An object of class scanstatistics. It has the following fields:

observed

A data.table containing the value of the statistic calculated for each zone-duration combination, for the observed data. The scan statistic is the maximum value of these calculated statistics.

replicated

A numeric vector of length n_mcsim containing the values of the scanstatistics calculated by Monte Carlo simulation.

mlc

A data.table containing the zone, duration, and scanstatistic.

pvalue

The p-value calculated from Monte Carlo replications.

distribution

The assumed distribution of the data; "Poisson" in this case.

type

The type of scan statistic; "Expectation-based" in this case.

zones

The set of zones that was passed to the function as input.

n_locations

The number of locations in the data.

n_zones

The number of zones.

max_duration

The maximum anomaly duration considered.

Details

For the expectation-based Poisson scan statistic, the null hypothesis of no anomaly holds that the count observed at each location \(i\) and duration \(t\) (the number of time periods before present) is Poisson-distributed with expected value \(\mu_{it}\): $$ H_0 : Y_{it} \sim \textrm{Poisson}(\mu_{it}), $$ for all locations \(i = 1, \ldots, m\) and all durations \(t = 1, \ldots,T\), with \(T\) being the maximum duration considered. Under the alternative hypothesis, there is a space-time window \(W\) consisting of a spatial zone \(Z \subset \{1, \ldots, m\}\) and a time window \(D \subseteq \{1, \ldots, T\}\) such that the counts in that window have their expected values inflated by a factor \(q_W > 1\) compared to the null hypothesis: $$ H_1 : Y_{it} \sim \textrm{Poisson}(q_W \mu_{it}), ~~(i,t) \in W. $$ For locations and durations outside of this window, counts are assumed to be distributed as under the null hypothesis. The sets \(Z\) considered are those specified in the argument zones, while the maximum duration \(T\) is taken as the maximum value in the column duration of the input table. For each space-time window \(W\) considered, (the log of) a likelihood ratio is computed using the distributions under the alternative and null hypotheses, and the expectation-based Poisson scan statistic is calculated as the maximum of these quantities over all space-time windows. Point estimates of the parameters \(\mu_{it}\) must be specified in the column mu of the argument table before this function is called.

Examples

Run this code
# NOT RUN {
# Simple example
set.seed(1)
table <- scanstatistics:::create_table(list(location = 1:4, duration = 1:4), 
                                        keys = c("location", "duration"))
table[, mu := 3 * location]
table[, count := rpois(.N, mu)]
table[location %in% c(1, 4) & duration < 3, count := rpois(.N, 2 * mu)]
zones <- scanstatistics:::powerset_zones(4)
result <- scan_poisson(table, zones, 100)
result
# }

Run the code above in your browser using DataLab