Parquet is a columnar storage file format. This function enables you to write Parquet files from R.
write_parquet(x, sink, chunk_size = NULL, version = NULL,
compression = NULL, compression_level = NULL,
use_dictionary = NULL, write_statistics = NULL,
data_page_size = NULL, properties = ParquetWriterProperties$create(x,
version = version, compression = compression, compression_level =
compression_level, use_dictionary = use_dictionary, write_statistics =
write_statistics, data_page_size = data_page_size),
use_deprecated_int96_timestamps = FALSE, coerce_timestamps = NULL,
allow_truncated_timestamps = FALSE,
arrow_properties = ParquetArrowWriterProperties$create(use_deprecated_int96_timestamps
= use_deprecated_int96_timestamps, coerce_timestamps = coerce_timestamps,
allow_truncated_timestamps = allow_truncated_timestamps))
An arrow::Table, or an object convertible to it.
an arrow::io::OutputStream or a string which is interpreted as a file path
chunk size in number of rows. If NULL, the total number of rows is used.
parquet version, "1.0" or "2.0".
compression algorithm. No compression by default.
compression level.
Specify if we should use dictionary encoding.
Specify if we should write statistics
Set a target threshhold for the approximate encoded size of data pages within a column chunk. If omitted, the default data page size (1Mb) is used.
properties for parquet writer, derived from arguments
version
, compression
, compression_level
, use_dictionary
, write_statistics
and data_page_size
Write timestamps to INT96 Parquet format
Cast timestamps a particular resolution. can be NULL, "ms" or "us"
Allow loss of data when coercing timestamps to a particular resolution. E.g. if microsecond or nanosecond data is lost when coercing to ms', do not raise an exception
arrow specific writer properties, derived from
arguments use_deprecated_int96_timestamps
, coerce_timestamps
and allow_truncated_timestamps
NULL, invisibly
The parameters compression
, compression_level
, use_dictionary
and write_statistics
support
various patterns:
- The default NULL
leaves the parameter unspecified, and the C++ library uses an appropriate default for
each column
- A single, unnamed, value (e.g. a single string for compression
) applies to all columns
- An unnamed vector, of the same size as the number of columns, to specify a value for each column, in
positional order
- A named vector, to specify the value for the named columns, the default value for the setting is used
when not supplied.
# NOT RUN {
tf1 <- tempfile(fileext = ".parquet")
write_parquet(data.frame(x = 1:5), tf2)
# using compression
tf2 <- tempfile(fileext = ".gz.parquet")
write_parquet(data.frame(x = 1:5), compression = "gzip", compression_level = 5)
# }
Run the code above in your browser using DataLab