Learn R Programming

gsClusterDetect

Description

An R package for implementing geospatial cluster identification from time series of counts, by location. Locations can be expressed as counties, zip codes, census tracts, or other user-defined geographies. Users provide:

  1. a data.frame of counts by location and date
  2. a distance object, that contains the distance between location and its neighbors

The package provides functions to create these distance objects in either matrix or list format. These can be generated for census tract, zip codes, or counties (fips), or can be constructed for custom locations by providing a dataframe with columns for latitude and longitude (i.e the centroid of each location).

Installation

Install the gsClusterDetect package from CRAN as follows:

install.packages("gsClusterDetect")

Install the development version from git as follows:

devtools::install_github("lmullany/gsClusterDetect")

Getting Started:

  1. Load the package and provide data frame with location, date, and count columns.
library(gsClusterDetect)
df <- example_count_data
tail(df)

   location       date count
     <char>     <IDat> <int>
1:    39171 2025-02-04     1
2:    39171 2025-02-05     0
3:    39173 2025-02-04     6
4:    39173 2025-02-05     7
5:    39175 2025-02-04     2
6:    39175 2025-02-05     0
  1. Generate the distance matrix for this location. In this case, the synthetic data has

counts from counties/fips in the state of OHIO, so we use county_distance_matrix() and pass the state abbreviation:

ohio_dm <- county_distance_matrix("OH")

# This is named list of two elements
cat("Class:", class(ohio_dm), "\nNames:", names(ohio_dm))

Class: list 
Names: loc_vec distance_matrix
  1. Set the end of your target period. This is called the detect_date, and is a parameter that must be

passed to the find_clusters function. Typically, this might be the current (or last available) date.

detect_date <- max(df[, date])
  1. Call the find_clusters() function; See ?find_clusters() for full set of options. Note that below,

we pass the minimum required elements: cases, distance_matrix, detect_date, and set the distance_limit (the maximum size of the clusters) to 50 (miles).

clusters <- find_clusters(
    cases = df,
    distance_matrix = ohio_dm[["distance_matrix"]],
    detect_date = detect_date,
    distance_limit = 50
)

Contacts:

Copyright 2026 The Johns Hopkins University Applied Physics Laboratory LLC.

Copy Link

Version

Install

install.packages('gsClusterDetect')

Version

1.0.0

License

Apache License (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Luke Mullany

Last Published

March 23rd, 2026

Functions in gsClusterDetect (1.0.0)

generate_ggplot_time_series

Generate ggplot of timeseries
.distance_meters_from_coords

Helper function gets the distance in meter between pairs of coordinates. Note that coords must be a matrix or frame, where the first col is longitude and the second column is latitude
generate_observed_expected

Generate the observed and expected information
spline_01

Spline Lookup Table - 0.01
spline_005

Spline Lookup Table - 0.005
generate_heatmap_data

Get heat map data from a set of location, date, count data
us_distance_matrix

Get distance matrix for all counties in the US
generate_summary_table

Summary count-by-location-and-date data, given baseline and test interval lengths, and an end-date for the test interval
zip_distance_matrix

Get distance matrix for zip codes within a state
get_test_dates

Generate test dates vector
generate_plotly_time_series

Generate plotly timeseries
ggplot_heatmap

Generate ggplot heatmap
get_baseline_dates

Generate baseline dates vector
generate_case_grids

Get candidate clusters and locations in baseline intervals
get_nearby_locations

Get nearby locations
gen_nearby_case_info

Return baseline and test period case grids restricting by distance
generate_time_series_data

Generate time series data
spline_05

Spline Lookup Table - 0.05
st_injects

Add data counts for parameterized injected clusters
tract_distance_matrix

Build a Tract Distance Matrix for a State
plotly_heatmap

Generate plotly heatmap
generate_time_series_plot

Generate timeseries plot data
gsClusterDetect-package

gsClusterDetect: Utilities for Geo-Spatial Cluster Detection and Significance Classification
tract_generator

Generate Census Tract Centroids for a State
spline_001

Spline Lookup Table - 0.001
reduce_clusters_to_min

Filter clusters on minimum overall count
generate_heatmap

Generate heatmap of data
zipcodes

Zipcode Location Dataset
check_vars

check for variables in frame
add_location_counts

Add location counts to cluster location list
counties

County Location Dataset
add_spline_threshold

Use spline lookup to restrict `ObservedExpectedGrid` to potential clusters
create_dist_list

Generalized distance list as sparse list
create_custom_dist_list

Create a sparse distance list from custom location data
compress_clusters_fast

Fast version of compress clusters
county_distance_matrix

Get distance matrix for counties within a state
custom_distance_matrix

Build a Distance Matrix from a Custom Data Frame
compress_clusters

Compress a cluster_alert_table
find_clusters

Find clusters
.sparse_dist_list_from_locs

This is a helper function to create a named list of all the locations in locs within threshold_meters of each loc in locs.
.numeric_location_coords

Helper function reduces a data frame to only those rows where latitude and longitude are not missing
.resolve_coord_var_names

Helper function resolves coordinate variable names
example_count_data

Example Count Dataset
.distance_result_from_coords

Helper function takes a vector of locations, and a set of coords (which must be a matrix or frame with first two columns being longitude and latitude), and returns a square distance matrix for all pairs of coordinates in a given unit
.validate_custom_locations

Helper function: given a data frame, and strings for label_var, lat_var, and long_var, the df is checked for
.meters_per_unit

Function returns the number of meters in unit (one of miles, kilometers, or meters)
.assert_tigris_available

Helper function simply asserts if tigris is installed. It is not required, to run the package in general, but is required for some additional functionality