Amend, Augment and Aid Analysis of John Snow's Cholera Data
Amends errors, augments data and aids analysis of John Snow's map
of the 1854 London cholera outbreak. The original data come from
Rusty Dodson and Waldo Tobler's 1992 digitization of Snow's map. Those
data, <http://www.ncgia.ucsb.edu/pubs/snow/snow.html>, are no longer
available. However, they are preserved in the 'HistData' package,
cholera: amend, augment and aid analysis of John Snow's 1854 cholera data
John Snow's map of the 1854 cholera outbreak in London's Soho is one of the best known examples of data visualization and information design.
The reasons are two-fold. First, as evidence of his claim that cholera is transmitted by water rather than air, Snow used a map to plot the spatial relationship between the location of water pumps, the primary source of drinking water, and that of cholera fatalities. Second, as a way to illustrate both the count and location of cases, Snow used "stacks" of horizontal bars (the orientation reflects the location's street).
However, while the map shows a concentration of fatalities around the Broad Street pump, it actually doesn't do the best job of excluding rival explanation. The pattern we see is not clearly different from what airborne transmission might look like. To address this problem, Snow added a graphical annotation to a second, lesser-known version of the map published in the official report on the outbreak:
This annotation outlines the Broad Street pump neighborhood, the residences Snow claims are within "close" walking distance to the pump. The notion of a pump neighborhood is important because it provides a specific (testable) prediction about where we should expect to find cases: if water is cholera's mode of transmission and and if the water pumps located on the street are the primary source of drinking water, then most, if not all, fatalities should be found within the neighborhood. To put it simply, the disease should stop at the neighborhood's borders. In this way, pump neighborhoods can help distinguish waterborne from airborne patterns of disease transmission.
To that end, this package builds on Snow's work by offering systematic ways to compute pump neighborhoods. Doing so not only provides a way to replicate and validate Snow's efforts, it also allows people to explore and investigate the data for themselves.
This release includes two methods of computing neighborhoods. The first uses Voronoi tessellation. It works by computing the Euclidean distances between pumps. While popular and easy to compute, its only drawback is that roads and walking distance play no role in the choice of pump: the method assumes that people can walk through walls to get to their preferred pump.
The second method, which actually follows Snow's lead, computes neighborhoods based on the "actual" walking distance along the streets of Soho. While more accurate, it is computationally more demanding to compute than Voronoi tessellation. To do so, I transform the roads on the map into a "social" graph and turn the computation of walking distance into a graph theory problem. For each case (observed or simulated), I compute the shortest weighted path to the nearest pump. Then by applying the "rinse and repeat" principle, the different pump neighborhoods emerge:
To explore the data, you can consider a variety of scenarios by computing neighborhoods using any subset of pumps. By doing so, you can explore hypotheses like the possibility that the choice of pump is affected by water quality.
other package features
- Fixes three apparent coding errors in Dodson and Tobler's 1992 digitization of Snow's map.
- "Unstacks" the data in two ways to improve analysis and visualization.
- Adds the ability to overlay graphical features like kernel density, Voronoi diagrams, and notable landmarks (John Snow's residence, the Lion Brewery, etc.).
- Includes a variety of functions to find and locate cases, roads, pumps and walking paths.
- Appends actual street names to the roads data.
- Includes the revised pump data used in the second version of Snow's map from the Vestry report. This includes the corrected location of the Broad Street pump.
- Adds two different aggregate time series fatalities data from the Vestry report.
To install 'cholera', use the expression below (you may need to install the 'devtools' package).
# install.packages("devtools") devtools::install_github("lindbrook/cholera", build_vignettes = TRUE)
Besides the help pages, the vignettes include detailed discussion about the data and functions:
vignette("duplicate.missing.cases") vignette("unstacking.fatalities") vignette("pump.neighborhoods") vignette("roads") vignette("time.series")
neighborhoodWalking() is computationally intensive (1-2 minutes on a single core). To improve performance, seven basic configurations have been pre-computed (for details, see neighborhoodWalking()'s Help Page) and a parallel, multi-core implementation is available on Linux and Mac.
Functions in cholera
|addVoronoi||Add Voronoi cells.|
|addWhitehead||Add Whitehead's Broad Street pump neighborhood.|
|anchor.case||Anchor or base case of each stack of fatalities.|
|border||Numeric IDs of line segments that create the map's border frame.|
|addPlaguePit||Add plague pit (Marshall Street).|
|addSnow||Adds Snow's Annotation of the Broad Street pump walking neighborhood.|
|addKernelDensity||Add 2D kernel density contours to a plot.|
|ortho.proj||Orthogonal projection of observed cases onto road network.|
|ortho.proj.pump||Orthogonal projection of 13 original pumps.|
|plot.time.series||Plot aggregate time series data from Vestry report.|
|plot.voronoi||Plot Voronoi neighborhoods.|
|simulateFatalities||Generate simulated fatalities and their orthogonal projections.|
|snow.neighborhood||Snow neighborhood fatalities.|
|timeSeries||Aggregate time series fatality data from the Vestry report.|
|unstackFatalities||Unstack "stacks" in Snow's cholera map.|
|fatalities||Amended Dodson and Tobler's cholera data.|
|fatalities.address||"Unstacked" amended cholera data with address as unit of observation.|
|pumpCase||Extract numeric case IDs by neighborhood.|
|caseLocator||Locate case by numerical ID.|
|cholera-package||cholera: amend, augment and aid analysis of John Snow's cholera data|
|fatalities.unstacked||"Unstacked" amended cholera fatalities data with fatality as unit of observation.|
|fixFatalities||Fix apparent coding error in Dodson and Tobler's digitization of Snow's map.|
|pumpLocator||Locate water pump by numerical ID.|
|pumps||Dodson and Tobler's pump data with street name.|
|sim.ortho.proj||Orthogonal projection of simulated "expected" cases onto road network.|
|sim.pump.case||List of "simulated" fatalities grouped by pump neighborhood.|
|ortho.proj.pump.vestry||Orthogonal projection of the 14 pumps from the Vestry Report.|
|plague.pit||Plague pit coordinates.|
|plot.walking||Plot observed and simulated walking neighborhoods.|
|pump.case||List of the observed fatality "addresses" grouped by pump neighborhood.|
|snowColors||Create a uniform set of colors for pump neighborhoods.|
|snowMap||Plot John Snow's cholera map.|
|summary.voronoi||Compute summary statistics for Voronoi neighborhoods.|
|neighborhoodVoronoi||Compute Voronoi neighborhoods.|
|neighborhoodWalking||Compute walking path neighborhoods.|
|pumps.vestry||Vestry report pump data.|
|summary.walking||Compute summary statistics for walking path neighborhoods.|
|road.segments||Dodson and Tobler's street data transformed into road segments.|
|roadSegments||Reshape roads dataframe into road.segments dataframe.|
|walkingDistance||Compute the walking distance between a case and the nearest (selected) pump.|
|walkingPath||Plot the walking path from a case to the nearest selected pump(s).|
|pumpData||Compute pump coordinates.|
|roads||Dodson and Tobler's street data with appended road names.|
|segmentLocator||Locate road segment by its character ID.|
|streetNameLocator||Locate road by name.|
|streetNumberLocator||Locate road by its numerical ID.|
Vignettes of cholera
Last month downloads
|Packaged||2017-08-10 17:42:23 UTC; peter|
|Date/Publication||2017-08-10 19:40:39 UTC|
Include our badge in your README