Arrow Datasets allow you to query against data that has been split across multiple files. This sharding of data may indicate partitioning, which can accelerate queries that only touch some partitions (files).
DatasetFactory
is used to help in the creation of Dataset
s.
Start a new scan of the data
Return the Dataset's Schema
The Dataset$create()
method instantiates a Dataset
and
takes the following arguments:
The DatasetFactory$create()
takes the following arguments:
sources
: a list of SourceFactory objects
A Dataset
has the following methods:
$NewScan()
: Returns a ScannerBuilder for building a query
$schema
: Active binding, returns the Schema of the Dataset
A DatasetFactory
has:
$Inspect()
: Returns a common Schema for the Sources
in the factory.
$Finish(schema)
: Returns a Dataset
open_dataset()
for a simple interface to creating a Dataset