A Dataset can have one or more Source
s. A Source
contains one or more
Fragments
, such as files, of a common type and partitioning.
SourceFactory
is used to create a Source
, inspect the Schema of the
fragments contained in it, and declare a partitioning.
FileSystemSourceFactory
is a subclass of SourceFactory
for
discovering files in the local file system, the only currently supported
file system.
In general, you'll deal with SourceFactory
rather than Source
itself.
Return the Source's Schema
For the SourceFactory$create()
factory method, see open_source()
, an
alias for it.
FileSystemSourceFactory$create()
is a lower-level factory method and
takes the following arguments:
filesystem
: A FileSystem
selector
: A FileSelector
format
: A string identifier of the format of the files in path
.
Currently supported options are "parquet", "arrow", and "ipc" (an alias for
the Arrow file format)
Dataset for what do do with a Source