A Scanner iterates over a Dataset's fragments and returns data
according to given row filtering and column projection. A ScannerBuilder
can help create one.
Scanner$create() wraps the ScannerBuilder interface to make a Scanner.
It takes the following arguments:
dataset: A Dataset or arrow_dplyr_query object, as returned by the
dplyr methods on Dataset.
projection: A character vector of column names to select columns or a
named list of expressions
filter: A Expression to filter the scanned rows by, or TRUE (default)
to keep all rows.
use_threads: logical: should scanning use multithreading? Default TRUE
use_async: logical: deprecated, this field no longer has any effect on
behavior.
...: Additional arguments, currently ignored
ScannerBuilder has the following methods:
$Project(cols): Indicate that the scan should only return columns given
by cols, a character vector of column names
$Filter(expr): Filter rows by an Expression.
$UseThreads(threads): logical: should the scan use multithreading?
The method's default input is TRUE, but you must call the method to enable
multithreading because the scanner default is FALSE.
$UseAsync(use_async): logical: deprecated, has no effect
$BatchSize(batch_size): integer: Maximum row count of scanned record
batches, default is 32K. If scanned record batches are overflowing memory
then this method can be called to reduce their size.
$schema: Active binding, returns the Schema of the Dataset
$Finish(): Returns a Scanner
Scanner currently has a single method, $ToTable(), which evaluates the
query and returns an Arrow Table.