uls.test: Upper Level Set Spatial Scan Test

Description

uls.test performs the Upper Level Set (ULS) spatial scan test of Patil and Taillie (2004). The test is performed using the spatial scan test based on a fixed number of cases. The windows are based on the Upper Level Sets proposed by Patil and Taillie (2004). The clusters returned are non-overlapping, ordered from most significant to least significant. The first cluster is the most likely to be a cluster. If no significant clusters are found, then the most likely cluster is returned (along with a warning).

Usage

uls.test(
  coords,
  cases,
  pop,
  w,
  ex = sum(cases)/sum(pop) * pop,
  nsim = 499,
  alpha = 0.1,
  ubpop = 0.5,
  longlat = FALSE,
  cl = NULL,
  type = "poisson",
  check.unique = FALSE
)

Value

Returns a list of length two of class scan. The first element (clusters) is a list containing the significant, non-ovlappering clusters, and has the the following components:

locids: The location ids of regions in a significant cluster.
pop: The total population in the cluser window.
cases: The observed number of cases in the cluster window.
expected: The expected number of cases in the cluster window.
smr: Standarized mortaility ratio (observed/expected) in the cluster window.
rr: Relative risk in the cluster window.
loglikrat: The loglikelihood ratio for the cluster window (i.e., the log of the test statistic).
pvalue: The pvalue of the test statistic associated with the cluster window.

The second element of the list is the centroid coordinates. This is needed for plotting purposes.

Arguments

coords: An \(n \times 2\) matrix of centroid coordinates for the regions in the form (x, y) or (longitude, latitude) is using great circle distance.
cases: The number of cases observed in each region.
pop: The population size associated with each region.
w: A binary spatial adjacency matrix for the regions.
ex: The expected number of cases for each region. The default is calculated under the constant risk hypothesis.
nsim: The number of simulations from which to compute the p-value.
alpha: The significance level to determine whether a cluster is signficant. Default is 0.10.
ubpop: The upperbound of the proportion of the total population to consider for a cluster.
longlat: The default is FALSE, which specifies that Euclidean distance should be used. If longlat is TRUE, then the great circle distance is used to calculate the intercentroid distance.
cl: A cluster object created by makeCluster, or an integer to indicate number of child-processes (integer values are ignored on Windows) for parallel evaluations (see Details on performance). It can also be "future" to use a future backend (see Details), NULL (default) refers to sequential evaluation.
type: The type of scan statistic to compute. The default is "poisson". The other choice is "binomial".
check.unique: A logical value indicating whether a check for unique values should be determined. The default is FALSE. This is unlikely to make a practical different for most real data sets.

Author

Joshua French

Details

The ULS method has a special (and time consuming) construction when the observed rates aren't unique. This is unlikely to arise for real data, except with observed rates of 0, which are of little interest. The method can take substantially if this is considered.

References

Patil, G.P. & Taillie, C. Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environmental and Ecological Statistics (2004) 11(2):183-197. <doi:10.1023/B:EEST.0000027208.48919.7e>

Examples

Run this code

data(nydf)
data(nyw)
coords <- with(nydf, cbind(longitude, latitude))
out <- uls.test(
  coords = coords, cases = floor(nydf$cases),
  pop = nydf$pop, w = nyw,
  alpha = 0.05, longlat = TRUE,
  nsim = 9, ubpop = 0.5
)
# better plotting
if (require("sf", quietly = TRUE)) {
   data(nysf)
   plot(st_geometry(nysf), col = color.clusters(out))
}

Run the code above in your browser using DataLab