Read data from a file in newline-delimited JavaScript Object Notation (NDJSON) format.
read_ndjson(file, mmap = FALSE, simplify = TRUE, text = NULL)the name of the file which the data are to be read from,
        or a connection (unless mmap is TRUE, see below).
        The data should be encoded as UTF-8, and each line should be a
        valid JSON value.
whether to memory-map the file instead of reading all of its data into memory simultaneously. See the ‘Memory mapping’ section.
whether to attempt to simplify the type of the return
       value. For example, if each line of the file stores an integer,
       if simplify is set to TRUE then the return value
       will be an integer vector rather than a corpus_json object.
a character vector of string fields to interpret as
       text instead of character, or NULL to
       interpret all strings as character.
In the default usage, with argument simplify = TRUE, when
    the lines of the file are records (JSON object literals), the
    return value from read_ndjson is a data frame with class
    c("corpus_frame", "data.frame"). With simplify = FALSE,
    the result is a corpus_json object.
When you specify mmap = TRUE, the function memory-maps the file
    instead of reading it into memory directly. In this case, the file
    argument must be a character string giving the path to the file, not
    a connection object. When you memory-map the file, the operating
    system reads data into memory only when it is needed, enabling
    you to transparently process large data sets that do not fit into
    memory.
In terms of memory usage, enabling mmap = TRUE reduces the
    footprint for corpus_json and corpus_text objects;
    native R objects (character, integer, list,
    logical, and numeric) get fully deserialized to
    memory and produce identical results regardless of whether
    mmap is TRUE or FALSE. To process a large
    text corpus with a text field named "text", you should set
    text = "text" and mmap = TRUE. Or, to reduce the memory
    footprint even further, set simplify = FALSE and
    mmap = TRUE.
One danger in memory-mapping is that if you delete the file
    after calling read_ndjson but before processing the data, then
    the results will be undefined, and your computer may crash. (On
    POSIX-compliant systems like Mac OS and Linux, there should be no
    ill effects to deleting the file. On recent versions of Windows,
    the system will not allow you to delete the file as long as the data
    is active.)
Another danger in memory-mapping is that if you serialize a
    corpus_json object or derived corpus_text object using
    saveRDS or another similar function, and then you
    deserialize the object, R will attempt create a new memory-map
    using the file argument passed to the original read_ndjson
    call. If file is a relative path, then your working directory
    at the time of deserialization must agree with your working directory
    at the time of the read_ndjson call.  You can avoid this
    situation by specifying an absolute path as the file argument
    (the normalizePath function will convert a relative
    to an absolute path).
This function is the recommended means of reading data for processing by the corpus package.
When the text argument is non-NULL string data
    fields with names indicated by this argument are decoded as
    text values, not as character values.
# NOT RUN {
# Memory mapping
lines <- c('{ "a": 1, "b": true }',
           '{ "b": false, "nested": { "c": 100, "d": false }}',
           '{ "a": 3.14, "nested": { "d": true }}')
file <- tempfile()
writeLines(lines, file)
(data <- read_ndjson(file, mmap = TRUE))
data$a
data$b
data$nested.c
data$nested.d
rm("data")
invisible(gc()) # force the garbage collector to release the memory-map
file.remove(file)
# }
Run the code above in your browser using DataLab