Arrow Datasets allow you to query against data that has been split across multiple files. This sharding of data may indicate partitioning, which can accelerate queries that only touch some partitions (files).
DatasetFactory is used to help in the creation of Datasets.
Start a new scan of the data
Return the Dataset's Schema
The Dataset$create() method instantiates a Dataset and
takes the following arguments:
The DatasetFactory$create() takes the following arguments:
sources: a list of SourceFactory objects
A Dataset has the following methods:
$NewScan(): Returns a ScannerBuilder for building a query
$schema: Active binding, returns the Schema of the Dataset
A DatasetFactory has:
$Inspect(): Returns a common Schema for the Sources in the factory.
$Finish(schema): Returns a Dataset
open_dataset() for a simple interface to creating a Dataset