bigQueryR (version 0.3.2)

bqr_partition: Convert date-sharded tables to a single partitioned table

Description

Moves the old style date-sharded tables such as [TABLE_NAME]_YYYYMMDD to the new date partitioned format.

Usage

bqr_partition(sharded, partition, projectId = bq_get_global_project(),
  datasetId = bq_get_global_dataset())

Arguments

sharded

The prefix of date-sharded tables to merge into one partitioned table

partition

Name of partitioned table. Will create if not present already

projectId

The project ID

datasetId

The dataset ID

Value

TRUE if all partition jobs start running successfully

Details

WARNING: This can be an expensive operation for large datasets as it does a full column scan.

From https://cloud.google.com/bigquery/docs/partitioned-tables:

Replicates the functionality of the bq tool bq query --allow_large_results --replace --noflatten_results --destination_table 'mydataset.table1$20160301' 'SELECT field1 + 10, field2 FROM mydataset.table1$20160301'

Before partitioned tables became available, BigQuery users would often divide large datasets into separate tables organized by time period; usually daily tables, where each table represented data loaded on that particular date.

Dividing a dataset into daily tables helped to reduce the amount of data scanned when querying a specific date range. For example, if you have a a year's worth of data in a single table, a query that involves the last seven days of data still requires a full scan of the entire table to determine which data to return. However, if your table is divided into daily tables, you can restrict the query to the seven most recent daily tables.

Daily tables, however, have several disadvantages. You must manually, or programmatically, create the daily tables. SQL queries are often more complex because your data can be spread across hundreds of tables. Performance degrades as the number of referenced tables increases. There is also a limit of 1,000 tables that can be referenced in a single query. Partitioned tables have none of these disadvantages.

See Also

https://cloud.google.com/bigquery/docs/creating-partitioned-tables