Read Rectangular Text Data
The goal of 'readr' is to provide a fast and friendly way to read
rectangular data (like 'csv', 'tsv', and 'fwf'). It is designed to flexibly
parse many types of data found in the wild, while still cleanly failing when
data unexpectedly changes.
The goal of readr is to provide a fast and friendly way to read rectangular data (like csv, tsv, and fwf). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes. If you are new to readr, the best place to start is the data import chapter in R for data science.
# The easiest way to get readr is to install the whole tidyverse: install.packages("tidyverse") # Alternatively, install just readr: install.packages("readr") # Or the the development version from GitHub: # install.packages("devtools") devtools::install_github("tidyverse/readr")
readr is part of the core tidyverse, so load it with:
library(tidyverse) #> ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ── #> ✔ ggplot2 3.1.0 ✔ purrr 0.2.5 #> ✔ tibble 1.4.2 ✔ dplyr 0.7.7 #> ✔ tidyr 0.8.2 ✔ stringr 1.3.1 #> ✔ readr 1.2.0 ✔ forcats 0.3.0 #> ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ── #> ✖ dplyr::filter() masks stats::filter() #> ✖ dplyr::lag() masks stats::lag()
To accurately read a rectangular dataset with readr you combine two pieces: a function that parses the overall file, and a column specification. The column specification describes how each column should be converted from a character vector to the most appropriate data type, and in most cases it’s not necessary because readr will guess it for you automatically.
readr supports seven file formats with seven
read_csv(): comma separated (CSV) files
read_tsv(): tab separated files
read_delim(): general delimited files
read_fwf(): fixed width files
read_table(): tabular files where columns are separated by white-space.
read_log(): web log files
In many cases, these functions will just work: you supply the path to a file and you get a tibble back. The following example loads a sample file bundled with readr:
mtcars <- read_csv(readr_example("mtcars.csv")) #> Parsed with column specification: #> cols( #> mpg = col_double(), #> cyl = col_double(), #> disp = col_double(), #> hp = col_double(), #> drat = col_double(), #> wt = col_double(), #> qsec = col_double(), #> vs = col_double(), #> am = col_double(), #> gear = col_double(), #> carb = col_double() #> )
Note that readr prints the column specification. This is useful because it allows you to check that the columns have been read in as you expect, and if they haven’t, you can easily copy and paste into a new call:
mtcars <- read_csv(readr_example("mtcars.csv"), col_types = cols( mpg = col_double(), cyl = col_integer(), disp = col_double(), hp = col_integer(), drat = col_double(), vs = col_integer(), wt = col_double(), qsec = col_double(), am = col_integer(), gear = col_integer(), carb = col_integer() ) )
vignette("readr") gives more detail on how readr guesses the column
types, how you can override the defaults, and provides some useful tools
for debugging parsing problems.
There are two main alternatives to readr: base R and data.table’s
fread(). The most important differences are discussed below.
Compared to the corresponding base functions, readr functions:
Use a consistent naming scheme for the parameters (e.g.
Are much faster (up to 10x).
Leave strings as is by default, and automatically parse common date/time formats.
Have a helpful progress bar if loading is going to take a while.
All functions work exactly the same way regardless of the current locale. To override the US-centric defaults, use
data.table has a function
read_csv() called fread. Compared to fread, readr
Are slower. If you want absolutely the best performance, use
Use a slightly more sophisticated parser, recognising both doubled (
"""") and backslash escapes (
"\""), and can produce factors and date/times directly.
Forces you to supply all parameters, where
fread()saves you work by automatically guessing the delimiter, whether or not the file has a header, and how many lines to skip.
Are built on a different underlying infrastructure. Readr functions are designed to be quite general, which makes it easier to add support for new rectangular data formats.
fread()is designed to be as fast as possible.
Joe Cheng for showing me the beauty of deterministic finite automata for parsing, and for teaching me why I should write a tokenizer.
JJ Allaire for helping me come up with a design that makes very few copies, and is easy to extend.
Dirk Eddelbuettel for coming up with the name!
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Functions in readr
|format_delim||Convert a data frame to a delimited string|
|parse_atomic||Parse logicals, integers, and reals|
|melt_fwf||Return melted data for each token in a fixed width file|
|melt_table||Return melted data for each token in a whitespace-separated file|
|write_delim||Write a data frame to a delimited file|
|read_log||Read common/combined log file into a tibble|
|read_delim||Read a delimited file (including csv & tsv) into a tibble|
|read_delim_chunked||Read a delimited file by chunks|
|read_rds||Read/write RDS files.|
|as.col_spec||Generate a column specification|
|cols||Create column specification|
|parse_vector||Parse a character vector.|
|problems||Retrieve parsing problems|
|read_table||Read whitespace-separated columns into a tibble|
|melt_delim||Return melted data for each token in a delimited file (including csv & tsv)|
|readr-package||readr: Read Rectangular Text Data|
|melt_delim_chunked||Melt a delimited file by chunks|
|read_lines||Read/write lines to/from a file|
|read_file||Read/write a complete file|
|read_fwf||Read a fixed width file into a tibble|
|readr_example||Get path to readr example|
|show_progress||Determine progress bars should be shown|
|parse_number||Parse numbers, flexibly|
|parse_guess||Parse using the "best" type|
|read_lines_chunked||Read lines from a file or string by chunk.|
|cols_condense||Examine the column specifications for a data frame|
|tokenize||Tokenize a file/string.|
|spec_delim||Generate a column specification|
|type_convert||Re-convert character columns in existing data frame|
|col_skip||Skip a column|
|count_fields||Count the number of fields in each line of a file|
|date_names||Create or retrieve date names|
|guess_encoding||Guess encoding of file|
|datasource||Create a source object.|
|clipboard||Returns values from the clipboard|
|output_column||Preprocess column for output|
Vignettes of readr
Last month downloads
|License||GPL (>= 2) | file LICENSE|
|Packaged||2018-12-20 16:06:40 UTC; jhester|
|Date/Publication||2018-12-21 09:40:02 UTC|
|imports||clipr , crayon , hms (>= 0.4.1) , methods , R6 , Rcpp (>= 0.12.0.5) , tibble|
|suggests||covr , curl , knitr , rmarkdown , spelling , stringi , testthat|
|depends||R (>= 3.1)|
|Contributors||Romain Francois, RStudio, R Core team, Hadley Wickham, Jukka Jyl<c3><a4>nki, Mikkel J<c3><b8>rgensen|
Include our badge in your README