readr v1.1.1

0

Monthly downloads

0th

Percentile

by James Hester

Read Rectangular Text Data

The goal of 'readr' is to provide a fast and friendly way to read rectangular data (like 'csv', 'tsv', and 'fwf'). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes.

Readme

readr

CRAN\_Status\_Badge Build Status AppVeyor Build Status Coverage Status

Overview

The goal of readr is to provide a fast and friendly way to read rectangular data (like csv, tsv, and fwf). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes. If you are new to readr, the best place to start is the data import chapter in R for data science.

Installation

# The easiest way to get readr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just readr:
install.packages("readr")

# Or the the development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/readr")

Usage

readr is part of the core tidyverse, so load it with:

library(tidyverse)

To accurately read a rectangular dataset with readr you combine two pieces: a function that parses the overall file, and a column specification. The column specification describes how each column should be converted from a character vector to the most appropriate data type, and in most cases it's not necessary because readr will guess it for you automatically.

readr supports seven file formats with seven read_ functions:

  • read_csv(): comma separated (CSV) files
  • read_tsv(): tab separated files
  • read_delim(): general delimited files
  • read_fwf(): fixed width files
  • read_table(): tabular files where colums are separated by white-space.
  • read_log(): web log files

In many cases, these functions will just work: you supply the path to a file and you get a tibble back. The following example loads a sample file bundled with readr:

mtcars <- read_csv(readr_example("mtcars.csv"))
#> Parsed with column specification:
#> cols(
#>   mpg = col_double(),
#>   cyl = col_integer(),
#>   disp = col_double(),
#>   hp = col_integer(),
#>   drat = col_double(),
#>   wt = col_double(),
#>   qsec = col_double(),
#>   vs = col_integer(),
#>   am = col_integer(),
#>   gear = col_integer(),
#>   carb = col_integer()
#> )

Note that readr prints the column specification. This is useful because it allows you to check that the columns have been read in as you expect, and if they haven't, you can easily copy and paste into a new call:

mtcars <- read_csv(readr_example("mtcars.csv"), col_types = 
  cols(
    mpg = col_double(),
    cyl = col_integer(),
    disp = col_double(),
    hp = col_integer(),
    drat = col_double(),
    vs = col_integer(),
    wt = col_double(),
    qsec = col_double(),
    am = col_integer(),
    gear = col_integer(),
    carb = col_integer()
  )
)

vignette("column-types") gives more detail on how readr guess the column types, how you can override the defaults, and provides some useful tools for debugging parsing problems.

Alternatives

There are two main alternatives to readr: base R and data.table's fread(). The most important differences are discussed below.

Base R

Compared to the corresponding base functions, readr functions:

  • Use a consistent naming scheme for the parameters (e.g. col_names and col_types not header and colClasses).

  • Are much faster (up to 10x).

  • Leave strings as is by default, and automatically parse common date/time formats.

  • Have a helpful progress bar if loading is going to take a while.

  • All functions work exactly the same way regardless of the current locale. To override the US-centric defaults, use locale().

data.table and fread()

data.table has a function similar to read_csv() called fread. Compared to fread, readr functions:

  • Are slower (currently ~1.2-2x slower. If you want absolutely the best performance, use data.table::fread().

  • Use a slightly more sophisticated parser, recognising both doubled ("""") and backslash escapes ("\""), and can produce factors and date/times directly.

  • Forces you to supply all parameters, where fread() saves you work by automatically guessing the delimiter, whether or not the file has a header, and how many lines to skip.

  • Are built on a different underlying infrastructure. Readr functions are designed to be quite general, which makes it easier to add support for new rectangular data formats. fread() is designed to be as fast as possible.

Acknowledgements

Thanks to:

  • Joe Cheng for showing me the beauty of deterministic finite automata for parsing, and for teaching me why I should write a tokenizer.

  • JJ Allaire for helping me come up with a design that makes very few copies, and is easy to extend.

  • Dirk Eddelbuettel for coming up with the name!

Functions in readr

Name Description
count_fields Count the number of fields in each line of a file
datasource Create a source object.
format_delim Convert a data frame to a delimited string
locale Create locales
col_skip Skip a column
cols Create column specification
date_names Create or retrieve date names
guess_encoding Guess encoding of file
Tokenizers Tokenizers.
callback Callback classes
read_delim Read a delimited file (including csv & tsv) into a tibble
read_delim_chunked Read a delimited file by chunks
output_column Preprocess column for output
parse_atomic Parse logicals, integers, and reals
readr_example Get path to readr example
cols_condense Examine the column specifications for a data frame
parse_guess Parse using the "best" type
parse_number Parse numbers, flexibly
read_table Read whitespace-separated columns into a tibble
readr-package readr: Read Rectangular Text Data
spec_delim Generate a column specification
tokenize Tokenize a file/string.
type_convert Re-convert character columns in existing data frame
write_delim Write a data frame to a delimited file
parse_datetime Parse date/times
parse_factor Parse factors
read_lines Read/write lines to/from a file
read_lines_chunked Read lines from a file or string by chunk.
read_log Read common/combined log file into a tibble
read_rds Read/write RDS files.
problems Retrieve parsing problems
read_file Read/write a complete file
read_fwf Read a fixed width file into a tibble
parse_vector Parse a character vector.
No Results!

Last month downloads

Details

Encoding UTF-8
LinkingTo Rcpp, BH
License GPL (>= 2) | file LICENSE
BugReports https://github.com/tidyverse/readr/issues
URL http://readr.tidyverse.org, https://github.com/tidyverse/readr
VignetteBuilder knitr
RoxygenNote 6.0.1
NeedsCompilation yes
Packaged 2017-05-16 16:03:56 UTC; jhester
Repository CRAN
Date/Publication 2017-05-16 19:03:57 UTC

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/readr)](http://www.rdocumentation.org/packages/readr)