cholera v0.2.1


Monthly downloads



Amend, Augment and Aid Analysis of John Snow's Cholera Data

Amends errors, augments data and aids analysis of John Snow's map of the 1854 London cholera outbreak. The original data come from Rusty Dodson and Waldo Tobler's 1992 digitization of Snow's map. Those data, <>, are no longer available. However, they are preserved in the 'HistData' package, <>.


cholera: amend, augment and aid analysis of John Snow's 1854 cholera data

John Snow's map of the 1854 cholera outbreak in London's Soho is one of the best known examples of data visualization and information design.

The reasons are two-fold. First, as evidence of his claim that cholera is transmitted by water rather than air, Snow used a map to plot the spatial relationship between the location of water pumps, the primary source of drinking water, and that of cholera fatalities. Second, as a way to illustrate both the count and location of cases, Snow used "stacks" of horizontal bars (the orientation reflects the location's street).

However, while the map shows a concentration of fatalities around the Broad Street pump, it actually doesn't do the best job of excluding rival explanation. The pattern we see is not clearly different from what airborne transmission might look like. To address this problem, Snow added a graphical annotation to a second, lesser-known version of the map published in the official report on the outbreak:

pump neighborhoods

This annotation outlines the Broad Street pump neighborhood, the residences Snow claims are within "close" walking distance to the pump. The notion of a pump neighborhood is important because it provides a specific (testable) prediction about where we should expect to find cases: if water is cholera's mode of transmission and and if the water pumps located on the street are the primary source of drinking water, then most, if not all, fatalities should be found within the neighborhood. To put it simply, the disease should stop at the neighborhood's borders. In this way, pump neighborhoods can help distinguish waterborne from airborne patterns of disease transmission.

To that end, this package builds on Snow's work by offering systematic ways to compute pump neighborhoods. Doing so not only provides a way to replicate and validate Snow's efforts, it also allows people to explore and investigate the data for themselves.

This release includes two methods of computing neighborhoods. The first uses Voronoi tessellation. It works by computing the Euclidean distances between pumps. While popular and easy to compute, its only drawback is that roads and walking distance play no role in the choice of pump: the method assumes that people can walk through walls to get to their preferred pump.


The second method, which actually follows Snow's lead, computes neighborhoods based on the "actual" walking distance along the streets of Soho. While more accurate, it is computationally more demanding to compute than Voronoi tessellation. To do so, I transform the roads on the map into a "social" graph and turn the computation of walking distance into a graph theory problem. For each case (observed or simulated), I compute the shortest weighted path to the nearest pump. Then by applying the "rinse and repeat" principle, the different pump neighborhoods emerge:


To explore the data, you can consider a variety of scenarios by computing neighborhoods using any subset of pumps. By doing so, you can explore hypotheses like the possibility that the choice of pump is affected by water quality.

other package features

  • Fixes three apparent coding errors in Dodson and Tobler's 1992 digitization of Snow's map.
  • "Unstacks" the data in two ways to improve analysis and visualization.
  • Adds the ability to overlay graphical features like kernel density, Voronoi diagrams, and notable landmarks (John Snow's residence, the Lion Brewery, etc.).
  • Includes a variety of functions to find and locate cases, roads, pumps and walking paths.
  • Appends actual street names to the roads data.
  • Includes the revised pump data used in the second version of Snow's map from the Vestry report. This includes the corrected location of the Broad Street pump.
  • Adds two different aggregate time series fatalities data from the Vestry report.

getting started

To install 'cholera', use the expression below (you may need to install the 'devtools' package).

# install.packages("devtools")
devtools::install_github("lindbrook/cholera", build_vignettes = TRUE)

Besides the help pages, the vignettes include detailed discussion about the data and functions:



neighborhoodWalking() is computationally intensive (1-2 minutes on a single core). To improve performance, seven basic configurations have been pre-computed (for details, see neighborhoodWalking()'s Help Page) and a parallel, multi-core implementation is available on Linux and Mac.

Functions in cholera

Name Description
addVoronoi Add Voronoi cells.
addWhitehead Add Whitehead's Broad Street pump neighborhood. Anchor or base case of each stack of fatalities.
border Numeric IDs of line segments that create the map's border frame.
addPlaguePit Add plague pit (Marshall Street).
addSnow Adds Snow's Annotation of the Broad Street pump walking neighborhood.
addKernelDensity Add 2D kernel density contours to a plot.
addLandmarks Add landmarks.
ortho.proj Orthogonal projection of observed cases onto road network.
ortho.proj.pump Orthogonal projection of 13 original pumps.
plot.time.series Plot aggregate time series data from Vestry report.
plot.voronoi Plot Voronoi neighborhoods.
simulateFatalities Generate simulated fatalities and their orthogonal projections.
snow.neighborhood Snow neighborhood fatalities.
timeSeries Aggregate time series fatality data from the Vestry report.
unstackFatalities Unstack "stacks" in Snow's cholera map.
fatalities Amended Dodson and Tobler's cholera data.
fatalities.address "Unstacked" amended cholera data with address as unit of observation.
pumpCase Extract numeric case IDs by neighborhood.
caseLocator Locate case by numerical ID.
cholera-package cholera: amend, augment and aid analysis of John Snow's cholera data
fatalities.unstacked "Unstacked" amended cholera fatalities data with fatality as unit of observation.
fixFatalities Fix apparent coding error in Dodson and Tobler's digitization of Snow's map.
pumpLocator Locate water pump by numerical ID.
pumps Dodson and Tobler's pump data with street name.
sim.ortho.proj Orthogonal projection of simulated "expected" cases onto road network. List of "simulated" fatalities grouped by pump neighborhood.
ortho.proj.pump.vestry Orthogonal projection of the 14 pumps from the Vestry Report.
plague.pit Plague pit coordinates.
plot.walking Plot observed and simulated walking neighborhoods. List of the observed fatality "addresses" grouped by pump neighborhood.
snowColors Create a uniform set of colors for pump neighborhoods.
snowMap Plot John Snow's cholera map.
summary.voronoi Compute summary statistics for Voronoi neighborhoods.
neighborhoodVoronoi Compute Voronoi neighborhoods.
neighborhoodWalking Compute walking path neighborhoods.
pumps.vestry Vestry report pump data.
regular.cases "Expected" cases.
summary.walking Compute summary statistics for walking path neighborhoods.
road.segments Dodson and Tobler's street data transformed into road segments.
roadSegments Reshape roads dataframe into road.segments dataframe.
walkingDistance Compute the walking distance between a case and the nearest (selected) pump.
walkingPath Plot the walking path from a case to the nearest selected pump(s).
pumpData Compute pump coordinates.
roads Dodson and Tobler's street data with appended road names.
segmentLocator Locate road segment by its character ID.
streetNameLocator Locate road by name.
streetNumberLocator Locate road by its numerical ID.
No Results!

Vignettes of cholera

No Results!

Last month downloads


Type Package
Date 2017-08-10
License GPL-3
LazyData true
VignetteBuilder knitr
RoxygenNote 6.0.1
NeedsCompilation no
Packaged 2017-08-10 17:42:23 UTC; peter
Repository CRAN
Date/Publication 2017-08-10 19:40:39 UTC

Include our badge in your README