hipread_long_chunked: Read a hierarchical fixed width data file, in chunks

Description

Analogous to readr::read_fwf(), but with chunks, and allowing for hierarchical fixed width data files (where the data file has rows of different record types, each with their own variables and column specifications). hipread_long_chunked() reads hierarchical data into "long" format, meaning that there is one row per observation, and variables that don't apply to the current observation receive missing values. Alternatively, hipread_list_chunked() reads hierarchical data into "list" format, which returns a list that has one data.frame per record type.

Usage

hipread_long_chunked(file, callback, chunk_size, var_info,
  rt_info = hip_rt(1, 0), compression = NULL, skip = 0,
  encoding = "UTF-8", progress = show_progress())
hipread_list_chunked(file, callback, chunk_size, var_info,
  rt_info = hip_rt(1, 0), compression = NULL, skip = 0,
  encoding = "UTF-8", progress = show_progress())

Arguments

file

A filename

callback

A callback function, allowing you to perform a function on each chunk.

chunk_size

The size of the chunks that will be read as a single unit (defaults to 10000)

var_info

Variable information, specified by either hip_fwf_positions() or hip_fwf_widths(). For hierarchical data files, there should be a named list, where the name is the value indicated by the record type variable and there is one variable information per record type.

rt_info

A record type information object, created by hip_rt(), which contains information about the location of the record type variable that defines the record type for each observation. The default contains width 0, which indicates that there the data is rectangular and does not have a record type variable.

compression

If NULL, guesses the compression from the file extension (if extension is "gz" uses gzip, otherwise treats as plain text), can specify it with a string ("txt" indicates plain text and "gz" for gzip).

skip

Number of lines to skip at the start of the data (defaults to 0).

encoding

(Defaults to UTF-8) A string indicating what encoding to use when reading the data, but like readr, the data will always be converted to UTF-8 once it is imported. Note that UTF-16 and UTF-32 are not supported for non-character columns.

progress

A logical indicating whether progress should be displayed on the screen, defaults to showing progress unless the current context is non-interactive or in a knitr document or if the user has turned off readr's progress by default using the option options("readr.show_progress").

Value

Depends on the type of callback function you use

Examples

Run this code

# NOT RUN {
# Read in a data, filtering out hhnum == "002"
data <- hipread_long_chunked(
  hipread_example("test-basic.dat"),
  HipDataFrameCallback$new(function(x, pos) x[x$hhnum != 2, ]),
  4,
  list(
    H = hip_fwf_positions(
      c(1, 2, 5, 8),
      c(1, 4, 7, 10),
      c("rt", "hhnum", "hh_char", "hh_dbl"),
      c("c", "i", "c", "d")
    ),
    P = hip_fwf_widths(
      c(1, 3, 1, 3, 1),
      c("rt", "hhnum",  "pernum", "per_dbl", "per_mix"),
      c("c", "i", "i", "d", "c")
    )
  ),
  hip_rt(1, 1)
)
# }

Run the code above in your browser using DataLab