range_speedread: Read Sheet as CSV

Description

This function uses a quick-and-dirty method to read a Sheet that bypasses the Sheets API and, instead, parses a CSV representation of the data. This can be much faster than range_read() -- noticeably so for "large" spreadsheets. There are real downsides, though, so we recommend this approach only when the speed difference justifies it. Here are the limitations we must accept to get faster reading:

Only formatted cell values are available, not underlying values or details on the formats.
We can't target a named range as the range.
We have no access to the data type of a cell, i.e. we don't know that it's logical, numeric, or datetime. That must be re-discovered based on the CSV data (or specified by the user).
Auth and error handling have to be handled a bit differently internally, which may lead to behaviour that differs from other functions in googlesheets4.

Note that the Sheets API is still used to retrieve metadata on the target Sheet, in order to support range specification. range_speedread() also sends an auth token with the request, unless a previous call to gs4_deauth() has put googlesheets4 into a de-authorized state.

Usage

range_speedread(ss, sheet = NULL, range = NULL, skip = 0, ...)

Arguments

Something that identifies a Google Sheet:

its file id as a string or drive_id
a URL from which we can recover the id
a one-row dribble, which is how googledrive represents Drive files
an instance of googlesheets4_spreadsheet, which is what gs4_get() returns

Processed through as_sheets_id().

sheet

Sheet to read, in the sense of "worksheet" or "tab". You can identify a sheet by name, with a string, or by position, with a number. Ignored if the sheet is specified via range. If neither argument specifies the sheet, defaults to the first visible sheet.

range

A cell range to read from. If NULL, all non-empty cells are read. Otherwise specify range as described in Sheets A1 notation or using the helpers documented in cell-specification. Sheets uses fairly standard spreadsheet range notation, although a bit different from Excel. Examples of valid ranges: "Sheet1!A1:B2", "Sheet1!A:A", "Sheet1!1:2", "Sheet1!A5:A", "A1:B2", "Sheet1". Interpreted strictly, even if the range forces the inclusion of leading, trailing, or embedded empty rows or columns. Takes precedence over skip, n_max and sheet. Note range can be a named range, like "sales_data", without any cell reference.

skip

Minimum number of rows to skip before reading anything, be it column names or data. Leading empty rows are automatically skipped, so this is a lower bound. Ignored if range is given.

...

Passed along to the CSV parsing function (currently readr::read_csv()).

Value

A tibble

Examples

Run this code

# NOT RUN {
if (gs4_has_token()) {
  if (require("readr")) {
    # since cell type is not available, use readr's col type specification
    range_speedread(
      gs4_example("deaths"),
      sheet = "other",
      range = "A5:F15",
      col_types = cols(
        Age = col_integer(),
        `Date of birth` = col_date("%m/%d/%Y"),
        `Date of death` = col_date("%m/%d/%Y")
      )
    )
  }

  # write a Sheet that, by default, is NOT world-readable
  (ss <- sheet_write(chickwts))

  # demo that range_speedread() sends a token, which is why we can read this
  range_speedread(ss)

  # clean up
  googledrive::drive_trash(ss)
}
# }

Run the code above in your browser using DataLab