A Dataset can constructed using one or more DatasetFactorys.
This function helps you construct a DatasetFactory that you can pass to
open_dataset().
dataset_factory(
x,
filesystem = c("auto", "local"),
format = c("parquet", "arrow", "ipc", "feather"),
partitioning = NULL,
allow_not_found = FALSE,
recursive = TRUE,
...
)A string file x containing data files, or
a list of DatasetFactory objects whose datasets should be
grouped. If this argument is specified it will be used to construct a
UnionDatasetFactory and other arguments will be ignored.
A string identifier for the filesystem corresponding to
x. Currently only "local" is supported.
A string identifier of the format of the files in x.
Currently "parquet" and "ipc"/"arrow"/"feather" (aliases for each other)
are supported. For Feather, only version 2 files are supported.
One of
A Schema, in which case the file paths relative to sources will be
parsed, and path segments will be matched with the schema fields. For
example, schema(year = int16(), month = int8()) would create partitions
for file paths like "2019/01/file.parquet", "2019/02/file.parquet", etc.
A character vector that defines the field names corresponding to those
path segments (that is, you're providing the names that would correspond
to a Schema but the types will be autodetected)
A HivePartitioning or HivePartitioningFactory, as returned
by hive_partition() which parses explicit or autodetected fields from
Hive-style path segments
NULL for no partitioning
logical: is x allowed to not exist? Default
FALSE. See FileSelector.
logical: should files be discovered in subdirectories of
x? Default TRUE.
Additional arguments passed to the FileSystem $create() method
A DatasetFactory object. Pass this to open_dataset(),
in a list potentially with other DatasetFactory objects, to create
a Dataset.
If you would only have a single DatasetFactory (for example, you have a
single directory containing Parquet files), you can call open_dataset()
directly. Use dataset_factory() when you
want to combine different directories, file systems, or file formats.