Learn R Programming

nanoparquet (version 0.4.2)

read_parquet_page: Read a page from a Parquet file

Description

Read a page from a Parquet file

Usage

read_parquet_page(file, offset)

Value

Named list. Many entries correspond to the columns of the result of read_parquet_pages(). Additional entries are:

  • codec: compression codec. Possible values:

  • has_repetition_levels: whether the page has repetition levels.

  • has_definition_levels: whether the page has definition levels.

  • schema_column: which schema column the page corresponds to. Note that only leaf columns have pages.

  • data_type: low level Parquet data type. Possible values:

  • repetition_type: whether the column the page belongs to is REQUIRED, OPTIONAL or REPEATED.

  • page_header: the bytes of the page header in a raw vector.

  • num_null: number of missing (NA) values. Only set in V2 data pages.

  • num_rows: this is the same as num_values for flat tables, i.e. files without repetition levels.

  • compressed_data: the data of the page in a raw vector. It includes repetition and definition levels, if any.

  • data: the uncompressed data, if nanoparquet supports the compression codec of the file (GZIP and SNAPPY at the time of writing), or if the file is not compressed. In the latter case it is the same as compressed_data.

Arguments

file

Path to a Parquet file.

offset

Integer offset of the start of the page in the file. See read_parquet_pages() for a list of all pages and their offsets.

See Also

read_parquet_pages() for a summary of all pages.

Examples

Run this code
if (FALSE) { # Sys.getenv("IN_PKGDOWN") == "true"
file_name <- system.file("extdata/userdata1.parquet", package = "nanoparquet")
nanoparquet:::read_parquet_pages(file_name)
options(max.print = 100)  # otherwise long raw vector
nanoparquet:::read_parquet_page(file_name, 4L)
}

Run the code above in your browser using DataLab