Learn R Programming

corpus (version 0.3.1)

read_ndjson: JSON Data Input

Description

Read data from a file in newline-delimited JavaScript Object Notation (NDJSON) format.

Usage

read_ndjson(file, simplify = TRUE)

Arguments

file
the name of the file which the data are to be read from. The file should be encoded as UTF-8, and each line of the file should be a valid JSON value.
simplify
whether to attempt to simplify the type of the return value. For example, if each line of the file stores an integer, if simplify is set to TRUE then the return value will be an integer vector rather than a jsondata object.

Details

This function is the principal means of reading data for processing by the corpus package.

Internally, the function memory-maps the file rather than reading it into memory directly. This allows the operating system to only read the data into memory when it is needed, enabling you to transparently process large data sets that do not fit into memory.

The return value from read_ndjson is a jsondata object that behaves like a data frame.

The one danger in using this function is that if you delete the file after calling read_ndjson but before processing the data, then the results will be undefined, and your computer may crash. (On POSIX-compliant systems like Mac OS and Linux, there should be no ill effects to deleting the file. On recent versions of Windows, the system will not allow you to delete the file as long as the data is active; on older versions, the behavior is undefined.)

Examples

Run this code
    lines <- c('{ "a": 1, "b": true }',
               '{ "b": false, "nested": { "c": 100, "d": false }}',
               '{ "a": 3.14, "nested": { "d": true }}')
    file <- tempfile()
    writeLines(lines, file)
    (data <- read_ndjson(file))

    data$a
    data$b
    data$nested.c
    data$nested.d

    rm("data")
    gc() # force the garbage collector to release the memory-map
    file.remove(file)

Run the code above in your browser using DataLab