funchir-io: Convenient Functions for Data Reading

Description

Functions which come in particular handy for process of reading in data which can turn verbose code into readable, clean code.

Usage

abbr_to_colClass(inits, counts)

Arguments

inits

Initials of data types to be passed to a colClasses argument (most typically in fread from data.table for me). See details.

counts

Corresponding counts (as an unbroken string) of each type given in inits. See details

Details

abbr_to_colClass was designed specifically for reading in large (read: wide, i.e., with many fields) data files when it is also necessary to specify the types to expect to the reader for speed or for accuracy.

Currently recognized types are blank, character, factor, logical, integer, numeric, Date, date, text and skip, which are abbreviated to their first initials: "b", "c", "f", "l", "i", "n", "D", "d", "t" and "s", respectively.

Since like types are often found in sequence, the counts argument can condense the call considerably--if three integer columns appear in a row, for example, we could specify inits="i" and counts="3" instead of the breathier inits="iii", counts="111".

Note that since counts is read digit-by-digit, sequences of length greater than 9 must be broken up into size-9 (or smaller) chunks, e.g., if there are 20 Date fields in a row, we could set inits="ddd", counts="992". This approach was taken (rather than, say, requiring counts to be an integer vector of counts) as I find it speedier and more concise, and the direct parallel to inits can elucidate issues which arise directly in the code instead of, say, checking cbind(strsplit(inits, split = "")[[1L]], counts).

Examples

Run this code

# NOT RUN {
  abbr_to_colClass(inits = "ncifdfd", counts = "1234567")
# }

Run the code above in your browser using DataLab