The goal of tidyr is to help you create tidy data. Tidy data is data where:
- Every column is variable.
- Every row is an observation.
- Every cell is a single value.
Tidy data describes a standard way of storing data that is used wherever
possible throughout the tidyverse. If you
ensure that your data is tidy, you’ll spend less time fighting with the
tools and more time working on your analysis. Learn more about tidy data
# The easiest way to get tidyr is to install the whole tidyverse: install.packages("tidyverse") # Alternatively, install just tidyr: install.packages("tidyr") # Or the development version from GitHub: # install.packages("devtools") devtools::install_github("tidyverse/tidyr")
tidyr functions fall into five main categories:
“Pivotting” which converts between long and wide forms. tidyr 1.0.0 introduces
pivot_wider(), replacing the older
vignette("pivot")for more details.
“Rectangling”, which turns deeply nested lists (as from JSON) into tidy tibbles. See
vignette("rectangle")for more details.
Nesting converts grouped data to a form where each group becomes a single row containing a nested data frame, and unnesting does the opposite. See
vignette("nest")for more details.
Splitting and combining character columns. Use
extract()to pull a single character column into multiple columns; use
unite()to combine multiple columns into a single character column.
Make implicit missing values explicit with
complete(); make explicit missing values implicit with
drop_na(); replace missing values with next/previous value with
fill(), or a known value with
tidyr supersedes reshape2 (2010-2014) and reshape (2005-2010). Somewhat counterintuitively, each iteration of the package has done less. tidyr is designed specifically for tidying data, not general reshaping (reshape2), or the general aggregation (reshape).
high-performance implementations of
If you’d like to read more about data reshaping from a CS perspective, I’d recommend the following three papers:
Wrangler: Interactive visual specification of data transformation scripts
An interactive framework for data cleaning (Potter’s wheel)
On efficiently implementing SchemaSQL on a SQL database system
To guide your reading, here’s a translation between the terminology used in different places:
|tidyr 1.0.0||pivot longer||pivot wider|
|tidyr < 1.0.0||gather||spread|
If you encounter a clear bug, please file a minimal reproducible example on github. For questions and other discussion, please use community.rstudio.com.
Please note that the tidyr project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.