find_clusters: Find clusters

Description

Function will return clusters, given a frame of case counts by location and date, a distance matrix, a spline lookup table, and other parameters

Usage

find_clusters(
  cases,
  distance_matrix,
  detect_date,
  spline_lookup = NULL,
  baseline_length = 90,
  max_test_window_days = 7,
  guard_band = 0,
  distance_limit = 15,
  baseline_adjustment = c("add_one", "add_one_global", "add_test", "none"),
  adj_constant = 1,
  min_clust_cases = 0,
  max_clust_cases = Inf,
  post_cluster_min_count = 0,
  use_fast = TRUE,
  return_interim = FALSE
)

Value

returns a list of two of two dataframes.

Arguments

cases: a frame of case counts by location and date
distance_matrix: a square distance matrix, named on both dimensions or a list of distance vectors, one for each location
detect_date: a date that indicates the end of the test window in which we are looking for clusters
spline_lookup: default NULL; either a spline lookup table, which is a data frame that has at least two columns: including "observed" and "spl_thresh", OR a string indicating to use one of the built in lookup tables: i.e. one of "001", "005", "01", "05". If NULL, the default table will be 01 (i.e. spline_01 dataset)
baseline_length: integer (default = 90) number of days in the baseline interval
max_test_window_days: integer (default = 7) number of days for the test window
guard_band: integer (default = 0) buffer days between baseline and test interval
distance_limit: numeric (default=15) maximum distance to consider cluster size. Note that the units of the value default (miles) should be the same unit as the values in the distance matrix
baseline_adjustment: one of four string options: "add_one" (default), "add_one_global", "add_test", or "none". All methods except for "none" will ensure that the log(obs/expected) is always defined (i.e. avoids expected =0). For the default, this will add 1 to the expected for any individual calculation if expected would otherwise be zero. "add_one_global", will add one to all baseline location case counts. For "add_test_interval", each location in the baseline is increased by the number of cases in that location during the test interval. If "none", no adjustment is made.
adj_constant: numeric (default=1.0); this is the constant to be added if baseline_adjustment == 'add_one' or baseline_adjustment == 'add_one'
min_clust_cases: (default = 0); minimum number of cluster cases to retain before compression
max_clust_cases: (default = Inf); maximum number of cluster cases to retain before compression
post_cluster_min_count: (default=0); a second (or alternative) way to limit cluster. This parameter can be set to a non-negative integer to require that any final clusters (post compression from candidate rows) have at least post_cluster_min_count cases, when aggregated over all locations within the identified cluster
use_fast: boolean (default = TRUE) - set to TRUE to use the fast version of the compress clusters function
return_interim: boolean (default = FALSE) - set to TRUE to return all interim objects of the find_clusters() function

Examples

Run this code

find_clusters(
  cases = example_count_data,
  distance_matrix = county_distance_matrix("OH")[["distance_matrix"]],
  detect_date = example_count_data[, max(date)],
  distance_limit = 50
)

Run the code above in your browser using DataLab