A Dataset can have one or more Sources. A Source contains one or more
Fragments, such as files, of a common storage location, format, and
partitioning. This function helps you construct a Source that you can
pass to open_dataset().
open_source(
path,
filesystem = c("auto", "local"),
format = c("parquet", "arrow", "ipc"),
partitioning = NULL,
allow_non_existent = FALSE,
recursive = TRUE,
...
)A string file path containing data files
A string identifier for the filesystem corresponding to
path. Currently only "local" is supported.
A string identifier of the format of the files in path.
Currently supported options are "parquet", "arrow", and "ipc" (an alias for
the Arrow file format)
One of
A Schema, in which case the file paths relative to sources will be
parsed, and path segments will be matched with the schema fields. For
example, schema(year = int16(), month = int8()) would create partitions
for file paths like "2019/01/file.parquet", "2019/02/file.parquet", etc.
A character vector that defines the field names corresponding to those
path segments (that is, you're providing the names that would correspond
to a Schema but the types will be autodetected)
A HivePartitioning or HivePartitioningFactory, as returned
by hive_partition() which parses explicit or autodetected fields from
Hive-style path segments
NULL for no partitioning
logical: is path allowed to not exist? Default
FALSE. See FileSelector.
logical: should files be discovered in subdirectories of
path? Default TRUE.
Additional arguments passed to the FileSystem $create() method
A SourceFactory object. Pass this to open_dataset(),
in a list potentially with other SourceFactory objects, to create
a Dataset.
If you only have a single Source, such as a directory containing Parquet
files, you can call open_dataset() directly. Use open_source() when you
want to combine different directories, file systems, or file formats.