datasauRus v0.1.4

0

Monthly downloads

0th

Percentile

Datasets from the Datasaurus Dozen

The Datasaurus Dozen is a set of datasets with the same summary statistics. They retain the same summary statistics despite having radically different distributions. The datasets represent a larger and quirkier object lesson that is typically taught via Anscombe's Quartet (available in the 'datasets' package). Anscombe's Quartet contains four very different distributions with the same summary statistics and as such highlights the value of visualisation in understanding data, over and above summary statistics. As well as being an engaging variant on the Quartet, the data is generated in a novel way. The simulated annealing process used to derive datasets from the original Datasaurus is detailed in "Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing" <doi:10.1145/3025453.3025912>.

Readme

datasauRus

CRAN
version Downloads Build
Status Project Status: Active – The project has reached a stable, usable
state and is being actively
developed.

This package wraps the awesome Datasaurus Dozen datasets. The Datasaurus Dozen show us why visualisation is important – summary statistics can be the same but distributions can be very different. In short, this package gives a fun alternative to Anscombe’s Quartet, available in R as anscombe.

The original Datasaurus was created by Alberto Cairo in this great blog post.

The other Dozen were generated using simulated annealing and the process is described in the paper “Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing” by Justin Matejka and George Fitzmaurice (open access materials including manuscript and code, official paper).

In the paper, Justin and George simulate a variety of datasets that the same summary statistics to the Datasaurus but have very different distributions.

sequential dino

Install

The latest stable version (0.1.2) is available on CRAN

install.packages("datasauRus")

You can get the latest development version from GitHub, so use devtools to install the package

devtools::install_github("lockedata/datasauRus")

Usage

You can use the package to produce Anscombe plots and more.

library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 3.5.1
library(datasauRus)
ggplot(datasaurus_dozen, aes(x=x, y=y, colour=dataset))+
  geom_point()+
  theme_void()+
  theme(legend.position = "none")+
  facet_wrap(~dataset, ncol=3)

Contributing to the package

Wanna report a bug or suggest a feature? Great stuff! For more information on how to contribute check out our contributing guide.

Please note that this R package is released with a Contributor Code of Conduct. By participating in this package project you agree to abide by its terms.

Functions in datasauRus

Name Description
datasauRus datasauRus
twelve_from_slant_long Twelve From Slant (long) data
datasaurus_dozen Datasaurus Dozen data
simpsons_paradox_wide Simpsons Paradox (wide) data
box_plots Box plot data
twelve_from_slant_alternate_long Twelve From Slant Alternate (long) data
twelve_from_slant_alternate_wide Twelve From Slant Alternate (wide) data
twelve_from_slant_wide Twelve From Slant (wide) data
simpsons_paradox Simpsons Paradox data
datasaurus_dozen_wide Datasaurus Dozen (wide) data
No Results!

Vignettes of datasauRus

Name
Datasaurus.Rmd
No Results!

Last month downloads

Details

License MIT + file LICENSE
Encoding UTF-8
LazyData true
VignetteBuilder knitr
RoxygenNote 6.1.0.9000
URL https://github.com/lockedata/datasauRus, https://itsalocke.com/datasaurus/
BugReports https://github.com/lockedata/datasauRus/issues
NeedsCompilation no
Packaged 2018-09-20 12:03:06 UTC; Maelle
Repository CRAN
Date/Publication 2018-09-20 14:50:02 UTC

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/datasauRus)](http://www.rdocumentation.org/packages/datasauRus)