Learn R Programming

eyeris (version 3.0.1)

eyeris_db_to_parquet: Split eyeris database into N parquet files by data type

Description

Utility function that takes an eyerisdb DuckDB database and splits it into N reasonably sized parquet files for easy management with GitHub, downloading, and distribution. Data is first grouped by table type (timeseries, epochs, events, etc.) since each has different columnar structures, then each group is split into the specified number of files. Files are organized in folders matching the database name for easy identification.

Usage

eyeris_db_to_parquet(
  bids_dir,
  db_path = "my-project",
  n_files_per_type = 1,
  output_dir = NULL,
  max_file_size = 512,
  data_types = NULL,
  verbose = TRUE,
  include_metadata = TRUE,
  epoch_labels = NULL,
  group_by_epoch_label = TRUE
)

Value

List containing information about created parquet files

Arguments

bids_dir

Path to the BIDS directory containing the database

db_path

Database name (defaults to "my-project", becomes "my-project.eyerisdb")

n_files_per_type

Number of parquet files to create per data type (default: 1)

output_dir

Directory to save parquet files (defaults to bids_dir/derivatives/parquet)

max_file_size

Maximum file size in MB per parquet file (default: 512) Used as a constraint when n_files_per_type would create files larger than this

data_types

Vector of data types to include. If NULL (default), includes all available. Valid types: "timeseries", "epochs", "epoch_summary", "events", "blinks", "confounds_*"

verbose

Whether to print progress messages (default: TRUE)

include_metadata

Whether to include eyeris metadata columns in output (default: TRUE)

epoch_labels

Optional character vector of epoch labels to include (e.g., "prepostprobe"). Only applies to epoch-related data types. If NULL, includes all labels.

group_by_epoch_label

If TRUE, processes epoch-related data types separately by epoch label to reduce memory footprint and produce label-specific parquet files (default: TRUE).

Database Safety

This function creates temporary tables during parquet export when the arrow package is not available. All temporary tables are automatically cleaned up, but if the process crashes, leftover tables may remain. The function checks for and warns about existing temporary tables before starting.

Examples

Run this code
# \donttest{
# create demo database
demo_data <- eyelink_asc_demo_dataset()
demo_data |>
  eyeris::glassbox() |>
  eyeris::epoch(
    events = "PROBE_{startstop}_{trial}",
    limits = c(-1, 1),
    label = "prePostProbe"
  ) |>
  eyeris::bidsify(
    bids_dir = tempdir(),
    participant_id = "001",
    session_num = "01",
    task_name = "memory",
    db_enabled = TRUE,
    db_path = "memory-task"
  )

# split into 3 parquet files per data type - creates memory-task/ folder
split_info <- eyeris_db_to_parquet(
  bids_dir = tempdir(),
  db_path = "memory-task",
  n_files_per_type = 3
)

# split with size constraint and specific data types using the same database
split_info <- eyeris_db_to_parquet(
  bids_dir = tempdir(),
  db_path = "memory-task",
  n_files_per_type = 5,
  max_file_size = 50,  # max 50MB per file
  data_types = c("timeseries", "epochs", "events")
)
# }

Run the code above in your browser using DataLab