besag.newell: Besag-Newell Cluster Detection Method

Description

Besag-Newell cluster detection method. There are differences with the original paper and our implementation:

we base our analysis on$k$cases, rather than$k$othercases as prescribed in the paper.
we do not subtract 1 from theaccumulated numbers of other casesandaccumulated numbers of others at risk, as was prescribed in the paper to discount selection bias
$M$is the total number of areas included, not the number of additional areas included. i.e.$M$starts at 1, not 0.
$p$-values are not based on the original value of$k$, rather the actual number of cases observed until we view$k$or more cases. Ex: if$k = 10$, but as we consider neighbors we encounter 1, 2, 9 then 12 cases, we base our$p$-values on$k=12$
we do not provide a Monte-Carlo simulated$R$: the number of tests that attain significance at a fixed level$\alpha$

The first two and last differences are because we view the testing on an area-by-area level, rather than a case-by-case level.

Usage

besag.newell(geo, population, cases, expected.cases=NULL, k, alpha.level)

Arguments

geo

an n x 2 table of the (x,y)-coordinates of the area centroids

cases

aggregated case counts for all n areas

population

aggregated population counts for all n areas

expected.cases

expected numbers of disease for all n areas

number of cases to consider

alpha.level

$\alpha$-level threshold used to declare significance

Value

List containing
clustersinformation on all clusters that are $\alpha$-level significant, in decreasing order of the $p$-value
p.valuesfor each of the $n$ areas, $p$-values of each cluster of size at least $k$
m.valuesfor each of the $n$ areas, the number of areas need to observe at least $k$ cases
observed.k.valuesbased on m.values, the actual number of cases used to compute the $p$-values

Details

For the population and cases tables, the rows are bunched by areas first, and then for each area, the counts for each strata are listed. It is important that the tables are balanced: the strata information are in the same order for each area, and counts for each area/strata combination appear exactly once (even if zero).

References

Besag J. and Newell J. (1991) The Detection of Clusters in Rare Diseases Journal of the Royal Statistical Society. Series A (Statistics in Society), 154, 143--155

Examples

Run this code

## Load Pennsylvania Lung Cancer Data
data(pennLC)
data <- pennLC$data

## Process geographical information and convert to grid
geo <- pennLC$geo[,2:3]
geo <- latlong2grid(geo)

## Get aggregated counts of population and cases for each county
population <- tapply(data$population,data$county,sum)
cases <- tapply(data$cases,data$county,sum)

## Based on the 16 strata levels, computed expected numbers of disease
n.strata <- 16
expected.cases <- expected(data$population, data$cases, n.strata)

## Set Parameters
k <- 1250
alpha.level <- 0.05

# not controlling for stratas
results <- besag.newell(geo, population, cases, expected.cases=NULL, k, alpha.level)

# controlling for stratas
results <- besag.newell(geo, population, cases, expected.cases, k, alpha.level)

Run the code above in your browser using DataLab