arrow (version 0.16.0.2)

Source: Sources for a Dataset

Description

A Dataset can have one or more Sources. A Source contains one or more Fragments, such as files, of a common type and partitioning. SourceFactory is used to create a Source, inspect the Schema of the fragments contained in it, and declare a partitioning. FileSystemSourceFactory is a subclass of SourceFactory for discovering files in the local file system, the only currently supported file system.

In general, you'll deal with SourceFactory rather than Source itself.

Return the Source's Schema

Arguments

Factory

For the SourceFactory$create() factory method, see open_source(), an alias for it.

FileSystemSourceFactory$create() is a lower-level factory method and takes the following arguments:

  • filesystem: A FileSystem

  • selector: A FileSelector

  • format: A string identifier of the format of the files in path. Currently supported options are "parquet", "arrow", and "ipc" (an alias for the Arrow file format)

Methods

Source has one defined method:

  • $schema: Active binding, returns the Schema of the Source

SourceFactory and its subclasses have the following methods:

  • $Inspect(): Walks the files in the directory and returns a common Schema

  • $Finish(schema): Returns a Source

See Also

Dataset for what do do with a Source