arrow (version 0.16.0.2)

Dataset: Multi-file datasets

Description

Arrow Datasets allow you to query against data that has been split across multiple files. This sharding of data may indicate partitioning, which can accelerate queries that only touch some partitions (files).

DatasetFactory is used to help in the creation of Datasets.

Start a new scan of the data

Return the Dataset's Schema

Arguments

Value

A ScannerBuilder

Factory

The Dataset$create() method instantiates a Dataset and takes the following arguments:

The DatasetFactory$create() takes the following arguments:

Methods

A Dataset has the following methods:

  • $NewScan(): Returns a ScannerBuilder for building a query

  • $schema: Active binding, returns the Schema of the Dataset

A DatasetFactory has:

  • $Inspect(): Returns a common Schema for the Sources in the factory.

  • $Finish(schema): Returns a Dataset

See Also

open_dataset() for a simple interface to creating a Dataset