Learn R Programming

⚠️There's a newer version (0.9.1) of this package.Take me there.

cholera: amend, augment and aid analysis of John Snow's 1854 cholera map

package features

  • Fixes three apparent coding errors in Dodson and Tobler's 1992 digitization of Snow's map.
  • "Unstacks" the data in two ways to improve analysis and visualization.
  • Computes and visualizes "pump neighborhoods" based on Euclidean (Voronoi tessellation) and walking distance.
  • Ability to overlay graphical features like kernel density, Voronoi diagrams, Snow's Broad Street neighborhood, and notable landmarks (John Snow's residence, the Lion Brewery, etc.).
  • Includes a variety of functions to highlight specific cases, roads, pumps and walking paths.
  • Appends street names to the roads data set.
  • Includes the revised pump data used in the second version of Snow's map from the Vestry report. This includes the "correct" location of the Broad Street pump.
  • Adds two different aggregate time series fatalities data sets, taken from the Vestry report.

background

John Snow's map of the 1854 cholera outbreak in London is one of the best known examples of data visualization and information design.

By plotting the number and location of fatalities on a map, Snow was able to do something that is easily taken for granted today: the ability to create and disseminate a visualization of a spatial distribution. To our modern eye, the pattern is unmistakable. It seems self-evident that the map elegantly supports Snow's claims that cholera is a waterborne disease and that the pump on Broad Street is the source of the outbreak. And yet, despite its virtues, the map failed to convince both the authorities and Snow's colleagues in the medical and scientific communities.

Beyond considerations of time and place, there are "scientific" reasons for this failure. The map shows a concentration of cases around the Broad Street pump, but that alone should not convince us that Snow is right. The map doesn't refute the primary rival explanation, miasma theory: the pattern we se is not unlike what airborne transmission might look like. And while the presence of a pump near or at the epicenter of the distribution of fatalities is strong circumstantial evidence, it is still circumstantial. There are a host of rival explanations that the map doesn't consider and cannot rule out: location of sewer grates, elevation, weather patterns, etc..

Arguably, this may be one reason why Snow added a graphical annotation in the second, lesser-known version of the map that was published in the official report on the outbreak (Report On The Cholera Outbreak In The Parish Of St. James, Westminster, During The Autumn Of 1854):

pump neighborhoods

The annotation outlines what we might call the Broad Street pump neighborhood: the set of addresses that are, according to Snow, within "close" walking distance to the pump. The notion of a pump neighborhood is important because it provides a prediction about where we should and should not expect to find cases. If water is cholera's mode of transmission and if water pumps are the primary source of drinking water, then most, if not all, fatalities should be found within the pump neighborhood. The disease should stop at the neighborhood's borders.

Creating this annotation is not a trivial matter. To identify the neighborhood of the Broad Street pump, you actually need to identify the neighborhoods of surrounding pumps. Snow writes: "The inner dotted line on the map shews [sic] the various points which have been found by careful measurement to be at an equal distance by the nearest road from the pump in Broad Street and the surrounding pumps ..." (Ibid., p. 109.).

I build on Snow's efforts by writing functions that allow you to compute two flavors of pump neighborhoods. The first is based on Voronoi tessellation. It works by computing the Euclidean distances between pumps. It's easy to compute and has been a popular choice for analysts of Snow's map. However, it has two drawbacks: 1) roads and buildings play no role in determining neighborhoods (it assumes that people walk directly, "as the crow flies", to their preferred pump); and 2) it's not what Snow has in mind. For that, you'll need to consider the second type of neighborhood.

plot(neighborhoodVoronoi())

The second flavor is based on the walking distance along the roads on the map. While more accurate, it's computationally more demanding. To compute these distances, I transform the roads on the map into a network graph and turn the computation of walking distance into a graph theory problem. For each case (observed or simulated), I compute the shortest path, weighted by the length of roads, to the nearest pump. Then, "rinse and repeat" and the different pump neighborhoods emerge:

plot(neighborhoodWalking())

To explore the data, you can consider a variety of scenarios by computing neighborhoods based on any subset of pumps. Here's the result excluding the Broad Street pump.

plot(neighborhoodWalking(-7))

"expected" pump neighborhoods

You can also explore "expected" neighborhoods. Currently, you can do so in three ways. The first colors roads.

plot(neighborhoodWalking(case.set = "expected", vestry = TRUE))

The second and third color each neighborhood's area by using either points or polygons. The polygon implementation is shown below. It's new, still under development and will for certain configurations throw an error. For exploration, type = "road" (the default shown above) or type = "area.points" is still preferable.

plot(neighborhoodWalking(case.set = "expected", vestry = TRUE), type = "area.polygons")

getting started

To install 'cholera' from CRAN:

install.packages("cholera")

To install the development version of 'cholera' from GitHub:

# Note that you may need to install the 'devtools' package:
# install.packages("devtools")
devtools::install_github("lindbrook/cholera", build_vignettes = TRUE)

Read the package's vignettes. They include detailed discussions about the data, the functions and the methods used to "fix" the data and to compute walking distances and neighborhoods.

note

neighborhoodWalking() and addNeighborhood() are computationally intensive. On a single core of a 2.3 GHz Intel i7, plotting observed paths to PDF takes about 6 seconds while doing so for expected paths takes about 30 seconds. Using the parallel implementation on 4 physical (8 logical) cores, these times fall to about 4 and 12 seconds.

Note that parallelization is currently only available on Linux and Mac.

Also, note that although some precautions are taken in R.app on macOS, the developers of the 'parallel' package, which neighborhoodWalking() uses, strongly discourage against using parallelization within a GUI or embedded environment. See vignette("parallel") for details.

Copy Link

Version

Install

install.packages('cholera')

Monthly Downloads

624

Version

0.4.0

License

GPL (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Peter Li

Last Published

April 1st, 2018

Functions in cholera (0.4.0)

plot.classifier_audit

Plot result of classifierAudit().
snowColors

Create a set of colors for pump neighborhoods.
print.euclidean_distance

Summary of euclideanDistance().
print.walking

Print method for neighborhoodWalking().
plague.pit

Plague pit coordinates.
print.voronoi

Print method for neighborhoodVoronoi().
neighborhoodVoronoi

Compute Voronoi pump neighborhoods.
fatalities.unstacked

"Unstacked" amended cholera fatalities data with fatality as unit of observation.
print.time_series

Print summary data for timeSeries().
timeSeries

Aggregate time series fatality data from the Vestry report.
neighborhoodWalking

Compute walking path pump neighborhoods.
pumpData

Compute pump coordinates.
pumps

Dodson and Tobler's pump data with street name.
snowMap

Plot John Snow's cholera map.
simulateFatalities

Generate simulated fatalities and their orthogonal projections.
ortho.proj

Orthogonal projection of observed cases onto road network.
pumpLocator

Locate water pump by numerical ID.
neighborhoodData

Compute network graph of roads, cases and pumps.
pumps.vestry

Vestry report pump data.
snow.neighborhood

Snow neighborhood fatalities.
plot.euclidean_distance

Plot the Euclidean distance between cases and/or pumps.
unitMeter

Convert nominal map distance to yards or meters.
plot.voronoi

Plot Voronoi neighborhoods.
print.walking_distance

Print method for walkingDistance().
plot.time_series

Plot aggregate time series data from Vestry report.
plot.walking

Plot method for neighborhoodWalking().
pumpCase

Numeric case IDs by pump neighborhood.
plot.walking_distance

Plot the walking distance between cases and/or pumps.
segmentLocator

Locate road segment by ID.
segmentLength

Compute length of road segment.
print.classifier_audit

Return result of classifierAudit().
roadSegments

Reshape 'roads' data frame into 'road.segments' data frame.
ortho.proj.pump

Orthogonal projection of 13 original pumps.
ortho.proj.pump.vestry

Orthogonal projection of the 14 pumps from the Vestry Report.
snowNeighborhood

Plotting data for Snow's graphical annotation of the Broad Street pump neighborhood.
road.segments

Dodson and Tobler's street data transformed into road segments.
streetNameLocator

Locate road by name.
regular.cases

"Expected" cases.
roads

Dodson and Tobler's street data with appended road names.
sim.ortho.proj

Orthogonal projection of simulated "expected" cases onto road network.
streetLength

Compute length of selected street.
unstackFatalities

Unstack "stacks" in Snow's cholera map.
streetNumberLocator

Locate road by numerical ID.
sim.pump.case

List of "simulated" fatalities grouped by walking-distance pump neighborhood.
walkingDistance

Compute the shortest walking distance between cases and/or pumps.
addNeighborhood

Add expected neighborhood polygons
addLandmarks

Add landmarks.
fatalities.address

"Unstacked" amended cholera data with address as unit of observation.
addIndexCase

Highlight index case at 40 Broad Street.
addKernelDensity

Add 2D kernel density contours.
fixFatalities

Fix apparent coding errors in Dodson and Tobler's digitization of Snow's map.
nearestPump

Compute shortest walking distances or paths.
addSnow

Adds Snow's graphical annotation of the Broad Street pump walking neighborhood.
addPlaguePit

Add plague pit (Marshall Street).
addWhitehead

Add Rev. Henry Whitehead's Broad Street pump neighborhood.
addPump

Add water pump by numerical ID.
addVoronoi

Add Voronoi cells.
euclideanDistance

Compute the Euclidean distance between cases and/or pumps.
fatalities

Amended Dodson and Tobler's cholera data.
anchor.case

Anchor or base case of each stack of fatalities.
cholera-package

cholera: amend, augment and aid analysis of John Snow's cholera map
border

Numeric IDs of line segments that create the map's border frame.
classifierAudit

Test if case is orthogonal to segment.
caseLocator

Locate case by numerical ID.