Datasets from the Datasaurus Dozen
The Datasaurus Dozen is a set of datasets with the same summary statistics. They
retain the same summary statistics despite having radically different distributions.
The datasets represent a larger and quirkier object lesson that is typically taught
via Anscombe's Quartet (available in the 'datasets' package). Anscombe's Quartet
contains four very different distributions with the same summary statistics and as
such highlights the value of visualisation in understanding data, over and above
summary statistics. As well as being an engaging variant on the Quartet, the data
is generated in a novel way. The simulated annealing process used to derive datasets
from the original Datasaurus is detailed in "Same Stats, Different Graphs: Generating
Datasets with Varied Appearance and Identical Statistics through Simulated Annealing"
This package wraps the awesome Datasaurus Dozen datasets. The Datasaurus
Dozen show us why visualisation is important – summary statistics can be
the same but distributions can be very different. In short, this package
gives a fun alternative to Anscombe’s
in R as
The original Datasaurus was created by Alberto Cairo in this great blog post.
The other Dozen were generated using simulated annealing and the process is described in the paper “Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing” by Justin Matejka and George Fitzmaurice (open access materials including manuscript and code, official paper).
In the paper, Justin and George simulate a variety of datasets that the same summary statistics to the Datasaurus but have very different distributions.
The latest stable version (0.1.2) is available on CRAN
You can get the latest development version from GitHub, so use
devtools to install the package
You can use the package to produce Anscombe plots and more.
library(ggplot2) #> Warning: package 'ggplot2' was built under R version 3.5.1 library(datasauRus) ggplot(datasaurus_dozen, aes(x=x, y=y, colour=dataset))+ geom_point()+ theme_void()+ theme(legend.position = "none")+ facet_wrap(~dataset, ncol=3)
Contributing to the package
Wanna report a bug or suggest a feature? Great stuff! For more information on how to contribute check out our contributing guide.
Please note that this R package is released with a Contributor Code of Conduct. By participating in this package project you agree to abide by its terms.
Functions in datasauRus
|twelve_from_slant_long||Twelve From Slant (long) data|
|datasaurus_dozen||Datasaurus Dozen data|
|simpsons_paradox_wide||Simpsons Paradox (wide) data|
|box_plots||Box plot data|
|twelve_from_slant_alternate_long||Twelve From Slant Alternate (long) data|
|twelve_from_slant_alternate_wide||Twelve From Slant Alternate (wide) data|
|twelve_from_slant_wide||Twelve From Slant (wide) data|
|simpsons_paradox||Simpsons Paradox data|
|datasaurus_dozen_wide||Datasaurus Dozen (wide) data|
Vignettes of datasauRus
Last month downloads
|License||MIT + file LICENSE|
|Packaged||2018-09-20 12:03:06 UTC; Maelle|
|Date/Publication||2018-09-20 14:50:02 UTC|
Include our badge in your README