# Hadley Wickham

#### 101 packages on CRAN

#### 18 packages on GitHub

A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another.

A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

Graphical scales map data to aesthetics, and provide methods for automatically determining breaks and labels for axes and legends.

A set of tools that solves a common set of problems: you need to break a big problem down into manageable pieces, operate on each piece and then put all the pieces back together. For example, you might want to fit a model to each spatial location or time point in your study, summarise data by panels or collapse high-dimensional arrays to simpler summary statistics. The development of 'plyr' has been generously supported by 'Becton Dickinson'.

Flexibly restructure and aggregate data using just two functions: melt and 'dcast' (or 'acast').

An alternative approach to non-standard evaluation using formulas. Provides a full implementation of LISP style 'quasiquotation', making it easier to generate code with other code.

assertthat is an extension to stopifnot() that makes it easy to declare the pre and post conditions that you code should satisfy, while also producing friendly error messages so that your users know what they've done wrong.

Useful tools for working with HTTP organised by HTTP verbs (GET(), POST(), etc). Configuration functions make it easy to control additional request components (authenticate(), add_headers() and so on).

An evolution of 'reshape2'. It's designed specifically for data tidying (not general reshaping or aggregating) and works well with 'dplyr' data pipelines.

Import foreign statistical formats into R via the embedded 'ReadStat' C library, <https://github.com/WizardMac/ReadStat>.

Helpers for reordering factor levels (including moving specified levels to front, ordering by first appearance, reversing, and randomly shuffling), and tools for modifying factor levels (including collapsing rare levels into other, 'anonymising', and manually 'recoding').

Wrappers around the 'xml2' and 'httr' packages to make it easy to download, then manipulate, HTML and XML.

The 'tidyverse' is a set of packages that work in harmony because they share common data representations and 'API' design. This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step. Learn more about the 'tidyverse' at <https://github.com/hadley/tidyverse>.

Functions for modelling that help you seamlessly integrate modelling into a pipeline of data manipulation and visualisation.

An object oriented system using object-based, also called prototype-based, rather than class-based object oriented ideas.

Airline on-time data for all flights departing NYC in 2013. Also includes useful 'metadata' on airlines, airports, weather, and planes.

Generate your Rd documentation, 'NAMESPACE' file, and collation field using specially formatted comments. Writing documentation in-line with code makes it easier to keep your documentation up-to-date as your requirements change. 'Roxygen2' is inspired by the 'Doxygen' system for C++.

A dplyr backend for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a DBI backend; more advanced features require SQL translation to be provided by the package author.

An easy way to determine which directories on the users computer you should use to save data, caches and logs. A port of Python's 'Appdirs' (\url{https://github.com/ActiveState/appdirs}) to R.

A dataset about movies. This was previously contained in ggplot2, but has been moved its own package to reduce the download size of ggplot2.

Read and write feather files, a lightweight binary columnar data store designed for maximum speed.

A data only package containing commercial domestic flights that departed Houston (IAH and HOU) in 2011.

US baby names provided by the SSA. This package contains all names used for at least 5 children of either sex.

A command-line interface to 'GGobi', an interactive and dynamic graphics package. 'Rggobi' complements the graphical user interface of 'GGobi' providing a way to fluidly transition between analysis and exploration, as well as automating common tasks.

Some functions at the intersection of 'dplyr' and 'purrr' that formerly lived in 'purrr'.

Framework for visualising tables of counts, proportions and probabilities. The framework is called product plots, alluding to the computation of area as a product of height and width, and the statistical concept of generating a joint distribution from the product of conditional and marginal distributions. The framework, with extensions, is sufficient to encompass over 20 visualisations previously described in fields of statistical graphics and 'infovis', including bar charts, mosaic plots, 'treemaps', equal area plots and fluctuation diagrams.

Implements the letter value 'boxplot' which extends the standard 'boxplot' to deal with both larger and smaller number of data points by dynamically selecting the appropriate number of letter values to display.

profr provides an alternative data structure and visual rendering for the profiling information generated by Rprof.

Implements geodesic interpolation and basis generation functions that allow you to create new tour methods from R.

Visualise clustering algorithms with GGobi. Contains both general code for visualising clustering results and specific visualisations for model-based, hierarchical and SOM clustering.

Given $p$-dimensional training data containing $d$ groups (the design space), a classification algorithm (classifier) predicts which group new data belongs to. Generally the input to these algorithms is high dimensional, and the boundaries between groups will be high dimensional and perhaps curvilinear or multi-faceted. This package implements methods for understanding the division of space between the groups.

Exploratory model analysis. Fit and graphical explore ensembles of linear models.

Makes it easy to insert 'emoji' based on either their name or a descriptive keyword.

Tools for monads.

A dplyr backend that partitions a data frame across multiple nodes in a cluster (e.g. cores on your computer) to make common operations faster.

A Doxygen-like in-source documentation system for Rd, collation, and NAMESPACE. (This is the third rewrite)

Convert package rd files to static html pages, suitable for serving on a website.

This packages tweaks the operation of base R code to make things a little stricter.

Automates the set up of common tools needed during package development and data analysis.

Provides a 'tbl_df' class (the 'tibble') that provides stricter checking and better formatting than the traditional data frame.

A toolbox for working with base types, core R features like the condition system, and core 'Tidyverse' features like tidy evaluation.

Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions. For more information, see package vignette. To quote Rene Magritte, "Ceci n'est pas un pipe."

The curl() and curl_download() functions provide highly configurable drop-in replacements for base url() and download.file() with better performance, support for encryption (https, ftps), gzip compression, authentication, and other 'libcurl' goodies. The core of the package implements a framework for performing fully customized requests where data can be processed either in memory, on disk, or streaming via the callback or connection interfaces. Some knowledge of 'libcurl' is recommended; for a more-user-friendly web client see the 'httr' package which builds on this package with http specific tools and logic.

Provides a general-purpose tool for dynamic report generation in R using Literate Programming techniques.

The goal of 'readr' is to provide a fast and friendly way to read rectangular data (like 'csv', 'tsv', and 'fwf'). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes.

Parsing and evaluation tools that make it easy to recreate the command line behaviour of R.

A backend for the selecting functions of the 'tidyverse'. It makes it easy to implement select-like functions in your own packages in a way that is consistent with other 'tidyverse' interfaces for selection.

Import excel files into R. Supports '.xls' via the embedded 'libxls' C library <https://sourceforge.net/projects/libxls/> and '.xlsx' via the embedded 'RapidXML' C++ library <https://rapidxml.sourceforge.net>. Works on Windows, Mac and Linux without external dependencies.

Helper functions to work with spreadsheets and the "A1:D10" style of cell range specification.

Functions to work with date-times and time-spans: fast and user friendly parsing of date-time data, extraction and updating of components of a date-time (years, months, days, hours, minutes, and seconds), algebraic manipulation on date-time and time-span objects. The 'lubridate' package has a consistent and memorable syntax that makes working with dates easy and fun.

A set of functions to run code 'with' safely and temporarily modified global state. Many of these functions were originally a part of the 'devtools' package, this provides a simple package with limited dependencies to provide access to these functions.

Access the RStudio API (if available) and provide informative error messages when it's not.

Cache the results of a function so that when you call it again with the same arguments it returns the pre-computed value.

A database interface definition for communication between R and relational database management systems. All classes in this package are virtual and need to be extended by the various R/DBMS implementations.

Work with XML files using a simple, consistent interface. Built on top of the 'libxml2' C library.

Convert statistical analysis objects from R into tidy data frames, so that they can more easily be combined, reshaped and otherwise processed with tools like 'dplyr', 'tidyr' and 'ggplot2'. The package provides three S3 generics: tidy, which summarizes a model's statistical findings such as coefficients of a regression; augment, which adds columns to the original data such as predictions, residuals and cluster assignments; and glance, which provides a one-row summary of model-level statistics.

Embeds the 'SQLite' database engine in R and provides an interface compliant with the 'DBI' package. The source for the 'SQLite' engine (version 3.8.8.2) is included.

R's raw vector is useful for storing a single binary object. What if you want to put a vector of them in a data frame? The blob package provides the blob object, a list of raw vectors, suitable for use as a column in data frame.

An extensible framework to create and preprocess design matrices. Recipes consist of one or more data manipulation and analysis "steps". Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting design matrices can then be used as inputs into statistical or machine learning models.

The R package 'ggplot2' is a plotting system based on the grammar of graphics. 'GGally' extends 'ggplot2' by adding several functions to reduce the complexity of combining geometric objects with transformed data. Some of these functions include a pairwise plot matrix, a two group pairwise plot matrix, a parallel coordinates plot, a survival plot, and several functions to plot networks.

A collection of functions to visualize spatial data and models on top of static maps from various online sources (e.g Google Maps and Stamen Maps). It includes tools common to those tasks, including functions for geolocation and routing.

Some extra themes, geoms, and scales for 'ggplot2'. Provides 'ggplot2' themes and scales that replicate the look of plots by Edward Tufte, Stephen Few, 'Fivethirtyeight', 'The Economist', 'Stata', 'Excel', and 'The Wall Street Journal', among others. Provides 'geoms' for Tufte's box plot and range frame.

A 'DBI' interface to 'MySQL' / 'MariaDB'. The 'RMySQL' package contains an old implementation based on legacy code from S-PLUS which being phased out. A modern 'MySQL' client based on 'Rcpp' is available from the 'RMariaDB' package on 'Github': <https://github.com/rstats-db/RMariaDB>.

Create and customize interactive maps using the 'Leaflet' JavaScript library and the 'htmlwidgets' package. These maps can be used directly from the R console, from 'RStudio', in Shiny apps and R Markdown documents.

A graphics device for R that produces 'Scalable Vector Graphics'. 'svglite' is a fork of the older 'RSvgDevice' package.

Support for simple features, a standardized way to encode spatial vector data. Binds to GDAL for reading and writing data, to GEOS for geometrical operations, and to Proj.4 for projection conversions and datum transformations.

Various tools for creating iterators, many patterned after functions in the Python itertools module, and others patterned after functions in the 'snow' package.

User-facing R functions are provided to parse, compile, test, estimate, and analyze Stan models by accessing the header-only Stan library provided by the 'StanHeaders' package. The Stan project develops a probabilistic programming language that implements full Bayesian statistical inference via Markov Chain Monte Carlo, rough Bayesian inference via 'variational' approximation, and (optionally penalized) maximum likelihood estimation via optimization. In all three cases, automatic differentiation is used to quickly and accurately evaluate gradients without burdening the user with the need to derive the partial derivatives.

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'camel style' was consequently applied to functions borrowed from contributed R packages as well.

This implements the data table back-end for 'dplyr' so that you can seamlessly use data table and 'dplyr' together.

These functions were developed to support functional data analysis as described in Ramsay, J. O. and Silverman, B. W. (2005) Functional Data Analysis. New York: Springer. They were ported from earlier versions in Matlab and S-PLUS. An introduction appears in Ramsay, J. O., Hooker, Giles, and Graves, Spencer (2009) Functional Data Analysis with R and Matlab (Springer). The package includes data sets and script files working many examples including all but one of the 76 figures in this latter book. Matlab versions of the code and sample analyses are no longer distributed through CRAN, as they were when the book was published. For those, ftp from <http://www.psych.mcgill.ca/misc/fda/downloads/FDAfuns/> There you find a set of .zip files containing the functions and sample analyses, as well as two .txt files giving instructions for installation and some additional information. The changes from Version 2.4.1 are fixes of bugs in density.fd and removal of functions create.polynomial.basis, polynompen, and polynomial. These were deleted because the monomial basis does the same thing and because there were errors in the code.

Output formats and utilities for authoring books and technical documents with R Markdown.

The 'HistData' package provides a collection of small data sets that are interesting and important in the history of statistics and data visualization. The goal of the package is to make these available, both for instructional use and for historical research. Some of these present interesting challenges for graphics or analysis in R.

An implementation of an interactive grammar of graphics, taking the best parts of 'ggplot2', combining them with the reactive framework of 'shiny' and drawing web graphics using 'vega'.

A suite of custom R Markdown formats and templates for authoring journal articles and conference submissions.

Download and install R packages stored in 'GitHub', 'BitBucket', or plain 'subversion' or 'git' repositories. This package is a lightweight replacement of the 'install_*' functions in 'devtools'. Indeed most of the code was copied over from 'devtools'.

Build a package documentation and function reference site and use it as the package vignette.

Creating tiny yet beautiful documents and vignettes from R Markdown. The package provides the 'html_pretty' output format as an alternative to the 'html_document' and 'html_vignette' engines that convert R Markdown into HTML pages. Various themes and syntax highlight styles are supported.

A 'ggplot2' extension that provides flipped components: horizontal versions of 'Stats' and 'Geoms', and vertical versions of 'Positions'.

Mosaic plots in the 'ggplot2' framework. Mosaic plot functionality is provided in a single 'ggplot2' layer by calling the geom 'mosaic'.

Interface with 'Google BigQuery', see <https://cloud.google.com/bigquery/> for more information. This package uses 'googleAuthR' so is compatible with similar packages, including 'Google Cloud Storage' (<https://cloud.google.com/storage/>) for result extracts.

Provides a set of functions for interacting with the 'Digital Ocean' API at <https://developers.digitalocean.com/documentation/v2>, including creating images, destroying them, rebooting, getting details on regions, and available images.

Geometric objects defined in 'geozoo' can be simulated or displayed in the R package 'tourr'.

This package provides user-level functions to manage namespaces not (yet) available in base R: 'registerNamespace', 'unregisterNamespace', 'makeNamespace', and 'getRegisteredNamespace' ('makeNamespaces' is extracted from the R 'base' package source code: src/library/base/R/namespace.R)

Imports non-tabular from Excel files into R. Exposes cell content, position and formatting in a tidy structure for further manipulation. Provides functions for selecting cells by position and relative position, and for associating data cells with header cells by proximity in given directions. Supports '.xlsx' and '.xlsm' via the embedded 'RapidXML' C++ library <http://rapidxml.sourceforge.net>. Does not support '.xlsb' or '.xls'.

Asks a custom Yes-No question with variable responses. The order and phrasing of the possible responses varies randomly to ensure the user consciously chooses (as opposed to automatically types their response).

Produce publication quality graphics from output of GGobi's describe display plugin.

Functions for working with legends and axis lines of 'ggplot2', facets that repeat axis lines on all panels, and some 'knitr' extensions.

Functions to convert Rd to roxygen documentation. It can parse an Rd file to a list, create the roxygen documentation and update the original R script (e.g. the one containing the definition of the function) accordingly. This package also provides utilities which can help developers build packages using roxygen more easily. The formatR package can be used to reformat the R code in the examples sections so that the code will be more readable.

A 'ggplot2' extension to visualize two variables through one color aesthetic via mapping to a color space projection. With this technique for 2-D color mapping, one can create a bivariate choropleth in R as well as other visualizations with multivariate color scales. Includes two new scales and a new guide for 'ggplot2'.

The base R data.frame, like any vector, is copied upon modification. This behavior is at odds with that of GUIs and interactive graphics. To rectify this, plumbr provides a mutable, dynamic tabular data model. Models may be chained together to form the complex plumbing necessary for sophisticated graphical interfaces. Also included is a general framework for linking datasets; an typical use case would be a linked brush.

Tools for visual inference. Generate null data sets and null plots using permutation and simulation. Calculate distance metrics for a lineup, and examine the distributions of metrics.

Query and print information about the current R session. It is similar to 'utils::sessionInfo()', but includes more information about packages, and where they were installed from.

The GUI allows user to control the tour by checkboxes for the variable selection, slider for the speed, and toggle boxes for pause.

Separate a data frame in two based on key columns. The function unjoin() provides an inside-out version of a nested data frame. This is used to identify duplication and normalize it (in the database sense) by linking two tables with the redundancy removed. This is a basic requirement for detecting topology within spatial structures that has motivated the need for this package as a building block for workflows within more applied projects.

Classes and functions to create and summarize different types of resampling objects (e.g. bootstrap, cross-validation).

Implements a 'DBI'-compliant interface to 'MariaDB' (<https://mariadb.org/>) and 'MySQL' (<https://www.mysql.com/>) databases.

ggsubplot makes it easy to embed customized subplots within larger graphics. Subplots may be used as a geom to explore interaction effects, spatial data, and hierarchical data. Subplots can also be used to explore big data without overplotting.

This is an R package as the next generation of GGobi, a software package for interactive and dynamic statistical graphics. It includes most of features in GGobi such as brushing, zooming, panning, identifying and linking, as well as common types of statistical graphics, e.g. bar plot, scatter plot, boxplot, histogram, density plot, spine plot, parallel coordinates plot, mosaic plot, maps, missing value plot, time series plot, tour, scatter plot matrix, hexagons and tiles (color images), etc. Based on the support of several other packages, cranvas aims for speed (from Qt) and flexibility (from R), with the style and design borrowed from ggplot2.

Interface to local and remote Git operations. Interface to local and remote Git operations. Interface to local and remote Git operations. Interface to local and remote Git operations.

Better html documentation for R

Provides a simple interface to lookup and print R function definitions, including C and C++ compiled code from .Call, .C, .Internal and .External calls. Also lookup of S3 and S4 generics, including a simple dialog to print any or all of the loaded methods for the generic.