Function will return clusters, given a frame of case counts by location and date, a distance matrix, a spline lookup table, and other parameters
find_clusters(
cases,
distance_matrix,
detect_date,
spline_lookup = NULL,
baseline_length = 90,
max_test_window_days = 7,
guard_band = 0,
distance_limit = 15,
baseline_adjustment = c("add_one", "add_one_global", "add_test", "none"),
adj_constant = 1,
min_clust_cases = 0,
max_clust_cases = Inf,
post_cluster_min_count = 0,
use_fast = TRUE,
return_interim = FALSE
)returns a list of two of two dataframes.
a frame of case counts by location and date
a square distance matrix, named on both dimensions or a list of distance vectors, one for each location
a date that indicates the end of the test window in which we are looking for clusters
default NULL; either a spline lookup table, which is a
data frame that has at least two columns: including "observed" and
"spl_thresh", OR a string indicating to use one of the built in lookup
tables: i.e. one of "001", "005", "01", "05". If NULL, the default
table will be 01 (i.e. spline_01 dataset)
integer (default = 90) number of days in the baseline interval
integer (default = 7) number of days for the test window
integer (default = 0) buffer days between baseline and test interval
numeric (default=15) maximum distance to consider cluster size. Note that the units of the value default (miles) should be the same unit as the values in the distance matrix
one of four string options: "add_one" (default), "add_one_global", "add_test", or "none". All methods except for "none" will ensure that the log(obs/expected) is always defined (i.e. avoids expected =0). For the default, this will add 1 to the expected for any individual calculation if expected would otherwise be zero. "add_one_global", will add one to all baseline location case counts. For "add_test_interval", each location in the baseline is increased by the number of cases in that location during the test interval. If "none", no adjustment is made.
numeric (default=1.0); this is the constant to be added
if baseline_adjustment == 'add_one' or baseline_adjustment ==
'add_one'
(default = 0); minimum number of cluster cases to retain before compression
(default = Inf); maximum number of cluster cases to retain before compression
(default=0); a second (or alternative) way to
limit cluster. This parameter can be set to a non-negative integer to
require that any final clusters (post compression from candidate rows) have
at least post_cluster_min_count cases, when aggregated over all
locations within the identified cluster
boolean (default = TRUE) - set to TRUE to use the fast version of the compress clusters function
boolean (default = FALSE) - set to TRUE to return all
interim objects of the find_clusters() function
find_clusters(
cases = example_count_data,
distance_matrix = county_distance_matrix("OH")[["distance_matrix"]],
detect_date = example_count_data[, max(date)],
distance_limit = 50
)
Run the code above in your browser using DataLab