Learn R Programming

nanoparquet (version 0.4.2)

read_parquet_pages: Metadata of all pages of a Parquet file

Description

Metadata of all pages of a Parquet file

Usage

read_parquet_pages(file)

Value

Data frame with columns:

  • file_name: file name.

  • row_group: id of the row group the page belongs to, an integer between 0 and the number of row groups minus one.

  • column: id of the column. An integer between the number of leaf columns minus one. Note that only leaf columns are considered, as non-leaf columns do not have any pages.

  • page_type: DATA_PAGE, INDEX_PAGE, DICTIONARY_PAGE or DATA_PAGE_V2.

  • page_header_offset: offset of the data page (its header) in the file.

  • uncompressed_page_size: does not include the page header, as per Parquet spec.

  • compressed_page_size: without the page header.

  • crc: integer, checksum, if present in the file, can be NA.

  • num_values: number of data values in this page, include NULL (NA in R) values.

  • encoding: encoding of the page, current possible encodings: "PLAIN", "GROUP_VAR_INT", "PLAIN_DICTIONARY", "RLE", "BIT_PACKED", "DELTA_BINARY_PACKED", "DELTA_LENGTH_BYTE_ARRAY", "DELTA_BYTE_ARRAY", "RLE_DICTIONARY", "BYTE_STREAM_SPLIT".

  • definition_level_encoding: encoding of the definition levels, see encoding for possible values. This can be missing in V2 data pages, where they are always RLE encoded.

  • repetition_level_encoding: encoding of the repetition levels, see encoding for possible values. This can be missing in V2 data pages, where they are always RLE encoded.

  • data_offset: offset of the actual data in the file.

  • page_header_length: size of the page header, in bytes.

Arguments

file

Path to a Parquet file.

Details

Reading all the page headers might be slow for large files, especially if the file has many small pages.

See Also

read_parquet_page() to read a page.

Examples

Run this code
file_name <- system.file("extdata/userdata1.parquet", package = "nanoparquet")
nanoparquet:::read_parquet_pages(file_name)

Run the code above in your browser using DataLab