Learn R Programming

scanstatistics (version 0.1.0)

scan_zip: Calculate the ZIP scan statistic.

Description

Calculate the expectation-based zero-inflated Poisson scan statistic by supplying a data.table of observed counts and pre-computed expected values and structural zero probabilities for each location and time. A p-value for the observed scan statistic can be obtained by Monte Carlo simulation.

Usage

scan_zip(table, zones, n_mcsim = 0, ...)

Arguments

table

A data.table with columns location, duration, count, mu, p. The location column should consist of integers that are unique to each location. The duration column should also consist of integers, starting at 1 for the most recent time period and increasing in reverse chronological order. The column mu should contain the estimated Poisson expected value parameters, and the column p the estimated structural zero probabilities.

zones

A set of zones, each zone itself a set containing one or more locations of those found in table.

n_mcsim

A non-negative integer; the number of replicate scan statistics to generate in order to calculate a p-value.

...

Arguments passed to internal functions. Arguments that can be passed here are d, the initial value for the excess zero indicators (default is 0.5), and tol, the threshold for the absolute convergence criterion (default is 0.01).

Value

An object of class scanstatistics. It has the following fields:

observed

A data.table containing the value of the statistic calculated for each zone-duration combination, for the observed data. The scan statistic is the maximum value of these calculated statistics.

replicated

A numeric vector of length n_mcsim containing the values of the scanstatistics calculated by Monte Carlo simulation.

mlc

A data.table containing the zone, duration, and scanstatistic.

pvalue

The p-value calculated from Monte Carlo replications.

distribution

The assumed distribution of the data; "zero-inflated Poisson" in this case.

type

The type of scan statistic; "Expectation-based" in this case.

zones

The set of zones that was passed to the function as input.

n_locations

The number of locations in the data.

n_zones

The number of zones.

max_duration

The maximum anomaly duration considered.

Details

For the expectation-based zero-inflated Poisson scan statistic (Kjellson 2015), the null hypothesis of no anomaly holds that the count observed at each location \(i\) and duration \(t\) (the number of time periods before present) has a zero-inflated Poisson distribution with expected value parameter \(\mu_{it}\) and structural zero probability \(p_{it}\): $$ H_0 : Y_{it} \sim \textrm{ZIP}(\mu_{it}, p_{it}). $$ This holds for all locations \(i = 1, \ldots, m\) and all durations \(t = 1, \ldots,T\), with \(T\) being the maximum duration considered. Under the alternative hypothesis, there is a space-time window \(W\) consisting of a spatial zone \(Z \subset \{1, \ldots, m\}\) and a time window \(D \subseteq \{1, \ldots, T\}\) such that the counts in that window have their Poisson expected value parameters inflated by a factor \(q_W > 1\) compared to the null hypothesis: $$ H_1 : Y_{it} \sim \textrm{ZIP}(q_W \mu_{it}, p_{it}), ~~(i,t) \in W. $$ For locations and durations outside of this window, counts are assumed to be distributed as under the null hypothesis. The sets \(Z\) considered are those specified in the argument zones, while the maximum duration \(T\) is taken as the maximum value in the column duration of the input table.

For each space-time window \(W\) considered, (the log of) a likelihood ratio is computed using the distributions under the alternative and null hypotheses, and the expectation-based Poisson scan statistic is calculated as the maximum of these quantities over all space-time windows. The expectation-maximization (EM) algorithm is used to obtain maximum likelihood estimates. Point estimates of the parameters \(\mu_{it}\) must be specified in the column mu of the argument table before this function is called.

References

Kjellson, B. (2015), Spatiotemporal Outbreak Detection: A Scan Statistic Based on the Zero-Inflated Poisson Distribution, (Master Thesis, Stockholm University), Link to PDF.

Examples

Run this code
# NOT RUN {
# Simple example
set.seed(1)
table <- scanstatistics:::create_table(list(location = 1:4, duration = 1:4),
                                        keys = c("location", "duration"))
table[, mu := 3 * location]
table[, p := runif(.N, 0, 0.3)]
table[, count := gamlss.dist::rZIP(.N, mu = mu, sigma = p)]
table[location %in% c(1, 4) & duration < 3, 
      count := gamlss.dist::rZIP(.N, mu = 2 * mu, sigma = p)]
zones <- scanstatistics:::powerset_zones(4)
result <- scan_poisson(table, zones, 100)
result
# }

Run the code above in your browser using DataLab