Learn R Programming

parquetize (version 0.5.8)

get_parquet_info: Get various info on parquet files

Description

One very important parquet metadata is the row group size.

If it's value is low (below 10 000), you should rebuild your parquet files.

Normal value is between 30 000 and 1 000 000

Usage

get_parquet_info(path)

Value

a tibble with 5 columns :

  • path, file path

  • num_rows, number of rows

  • num_row_groups, number of group row

  • num_columns,

  • row_group_size, mean row group size

If one column contain NA, parquet file may be malformed.

Arguments

path

parquet file path or directory. If directory is given, get_parquet_info will be applied on all parquet files found in subdirectories

Examples

Run this code
get_parquet_info(system.file("extdata", "iris.parquet", package = "parquetize"))

get_parquet_info(system.file("extdata", "iris_dataset", package = "parquetize"))

Run the code above in your browser using DataLab