The goal of readr is to provide a fast and friendly way to read tabular data into R. The most important functions are:
- Read delimited files:
- Read fixed width files:
- Read lines:
- Read whole file:
- Re-parse existing data frame:
readr is now available from CRAN.
You can try out the dev version with:
# install.packages("devtools") devtools::install_github("hadley/readr")
library(readr) library(dplyr) mtcars_path <- tempfile(fileext = ".csv") write_csv(mtcars, mtcars_path) # Read a csv file into a data frame read_csv(mtcars_path) # Read lines into a vector read_lines(mtcars_path) # Read whole file into a single string read_file(mtcars_path)
vignette("column-types") on how readr parses columns, and how you can override the defaults.
read_csv() produces a data frame with the following properties:
Characters are never automatically converted to factors (i.e. no more
stringsAsFactors = FALSE).
Valid column names are left as is, not munged into valid R identifiers (i.e. there is no
check.names = TRUE). Missing column names are filled in with
X2etc, and duplicated column names are deduplicated.
The data frame is given class
c("tbl_df", "tbl", "data.frame")so if you also use dplyr you'll get an enhanced display.
Row names are never set.
If there are any problems parsing the file, the
read_ function will throw a warning telling you how many problems there are. You can then use the
problems() function to access a data frame that gives information about each problem:
df <- read_csv(col_types = "dd", col_names = c("x", "y"), skip = 1, " 1,2 a,b ") #> Warning message: There were 2 problems. See problems(x) for more details problems(df) #> row col expected actual #> 1 2 1 a double a #> 2 2 2 a double b
It's likely that there will be cases that you can never load without some manual regexp-based munging in R. Load those columns with
col_character(), fix them up as needed, then use
convert_types() to re-run the automated conversion on every character column in the data frame. Alternatively, you can use
parse_date() etc to parse a single character vector at a time.
Compared to base functions
Compared to the corresponding base functions, readr functions:
- Use a consistent naming scheme for the parameters (e.g.
Are much faster (up to 10x faster).
Have a helpful progress bar if loading is going to take a while.
All functions work exactly the same way regardless of the current locale. To override the US-centric defaults, use
data.table has a function similar to
read_csv() called fread. Compared to fread, readr:
Is slower (currently ~1.2-2x slower. If you want absolutely the best performance, use
Readr has a slightly more sophisticated parser, recognising both doubled ("""") and backslash escapes ("""). Readr allows you to read factors and date times directly from disk.
fread()saves you work by automatically guessing the delimiter, whether or not the file has a header, how many lines to skip by default and more. Readr forces you to supply these parameters.
The underlying designs are quite different. Readr is designed to be general, and dealing with new types of rectangular data just requires implementing a new tokenizer.
fread()is designed to be as fast as possible.
fread()is pure C, readr is C++ (and Rcpp).