tidyr v0.6.3
Monthly downloads
Easily Tidy Data with 'spread()' and 'gather()' Functions
An evolution of 'reshape2'. It's designed specifically for data
tidying (not general reshaping or aggregating) and works well with
'dplyr' data pipelines.
Readme
tidyr 
Overview
The goal of tidyr is to help you create tidy data. Tidy data is data where:
- Each variable is in a column.
- Each observation is a row.
- Each value is a cell.
Tidy data describes a standard way of storing data that is used wherever possible throughout the tidyverse. If you ensure that your data is tidy, you'll spend less timing fighting with the tools and more time working on your analysis.
Installation
# The easiest way to get tidyr is to install the whole tidyverse:
install.packages("tidyverse")
# Alternatively, install just tidyr:
install.packages("tidyr")
# Or the the development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/tidyr")
Getting started
library(tidyr)
There are two fundamental verbs of data tidying:
gather()
takes multiple columns, and gathers them into key-value pairs: it makes "wide" data longer.spread()
. takes two columns (key & value) and spreads in to multiple columns, it makes "long" data wider.
tidyr also provides separate()
and extract()
functions which makes it easier to pull apart a column that represents multiple variables. The complement to separate()
is unite()
.
To get started, read the tidy data vignette (vignette("tidy-data")
) and check out the demos, demo(package = "tidyr")
).
Related work
tidyr replaces reshape2 (2010-2014) and reshape (2005-2010). Somewhat counterintuitively each iteration of the the package has done less. tidyr is designed specifically for tidying data, not general reshaping (reshape2), or the general aggregation (reshape).
If you'd like to read more about data reshaping from a CS perspective, I'd recommend the following three papers:
Wrangler: Interactive visual specification of data transformation scripts
An interactive framework for data cleaning (Potter's wheel)
On efficiently implementing SchemaSQL on a SQL database system
To guide your reading, here's translation between the terminology used in different places:
tidyr | gather | spread |
---|---|---|
reshape(2) | melt | cast |
spreadsheets | unpivot | pivot |
databases | fold | unfold |
Functions in tidyr
Name | Description | |
drop_na | Drop rows containing missing values | |
drop_na_ | Standard-evaluation version of drop_na. | |
expand | Expand data frame to include all combinations of values | |
expand_ | Expand (standard evaluation). | |
gather | Gather columns into key-value pairs. | |
gather_ | Gather (standard-evaluation). | |
%>% | Pipe operator | |
replace_na | Replace missing values | |
unnest_ | Standard-evaluation version of unnest. | |
separate | Separate one column into multiple columns. | |
separate_ | Standard-evaluation version of separate. | |
unite_ | Standard-evaluation version of unite | |
unnest | Unnest a list column. | |
extract_numeric | Extract numeric component of variable. | |
fill | Fill in missing values. | |
fill_ | Standard-evaluation version of fill. | |
full_seq | Create the full sequence of values in a vector. | |
who | World Health Organization TB data | |
separate_rows | Separate a collapsed column into multiple rows. | |
separate_rows_ | Standard-evaluation version of separate_rows. | |
complete | Complete a data frame with missing combinations of data. | |
complete_ | Standard-evaluation version of complete. | |
spread_ | Standard-evaluation version of spread. | |
tidyr-package | tidyr: Easily Tidy Data with 'spread()' and 'gather()' Functions | |
unite | Unite multiple columns into one. | |
table1 | Example tabular representations | |
extract | Extract one column into multiple columns. | |
extract_ | Standard-evaluation version of extract. | |
nest | Nest repeated values in a list-variable. | |
nest_ | Standard-evaluation version of nest. | |
smiths | Some data about the Smith family. | |
spread | Spread a key-value pair across multiple columns. | |
No Results! |
Vignettes of tidyr
Name | ||
billboard.csv | ||
pew.csv | ||
preg.csv | ||
preg2.csv | ||
tb.csv | ||
tidy-data.Rmd | ||
weather.csv | ||
No Results! |
Last month downloads
Details
License | MIT + file LICENSE |
LazyData | true |
URL | http://tidyr.tidyverse.org, https://github.com/tidyverse/tidyr |
BugReports | https://github.com/tidyverse/tidyr/issues |
VignetteBuilder | knitr |
LinkingTo | Rcpp |
RoxygenNote | 6.0.1 |
NeedsCompilation | yes |
Packaged | 2017-05-15 16:46:11 UTC; hadley |
Repository | CRAN |
Date/Publication | 2017-05-15 18:08:39 UTC |
Include our badge in your README
[](http://www.rdocumentation.org/packages/tidyr)