bqr_query_asynch: BigQuery query asynchronously

Description

Use for big results > 10000 that write to their own destinationTableId.

Usage

bqr_query_asynch(projectId = bq_get_global_project(),
  datasetId = bq_get_global_dataset(), query, destinationTableId,
  useLegacySql = TRUE, writeDisposition = c("WRITE_EMPTY", "WRITE_TRUNCATE",
  "WRITE_APPEND"))

Arguments

projectId

projectId to be billed.

datasetId

datasetId of where query will execute.

query

The BigQuery query as a string.

destinationTableId

Id of table the results will be written to.

useLegacySql

Whether the query you pass is legacy SQL or not. Default TRUE

writeDisposition

Behaviour if destination table exists. See Details.

Value

A Job object to be queried via bqr_get_job

Details

For bigger queries, asynchronous queries save the results to another BigQuery table. You can check the progress of the job via bqr_get_job

You may now want to download this data. For large datasets, this is best done via extracting the BigQuery result to Google Cloud Storage, then downloading the data from there.

You can read how to create a bucket at Google Cloud Storage at https://cloud.google.com/storage/docs/cloud-console

writeDisposition - behaviour if destinationTable already exists:

WRITE_TRUNCATE: BigQuery overwrites the table data.
WRITE_APPEND: BigQuery appends the data to the table
WRITE_EMPTY: If contains data, a 'duplicate' error is returned

Examples

Run this code

# NOT RUN {
# }
# NOT RUN {
library(bigQueryR)

## Auth with a project that has at least BigQuery and Google Cloud Storage scope
bqr_auth()

## make a big query
job <- bqr_query_asynch("your_project", 
                        "your_dataset",
                        "SELECT * FROM blah LIMIT 9999999", 
                        destinationTableId = "bigResultTable")
                        
## poll the job to check its status
## its done when job$status$state == "DONE"
bqr_get_job("your_project", job$jobReference$jobId)

##once done, the query results are in "bigResultTable"
## extract that table to GoogleCloudStorage:
# Create a bucket at Google Cloud Storage at 
# https://console.cloud.google.com/storage/browser

job_extract <- bqr_extract_data("your_project",
                                "your_dataset",
                                "bigResultTable",
                                "your_cloud_storage_bucket_name")
                                
## poll the extract job to check its status
## its done when job$status$state == "DONE"
bqr_get_job("your_project", job_extract$jobReference$jobId)

## to download via a URL and not logging in via Google Cloud Storage interface:
## Use an email that is Google account enabled
## Requires scopes:
##  https://www.googleapis.com/auth/devstorage.full_control
##  https://www.googleapis.com/auth/cloud-platform
## set via options("bigQueryR.scopes") and reauthenticate if needed

download_url <- bqr_grant_extract_access(job_extract, "your@email.com")

## download_url may be multiple if the data is > 1GB

# }
# NOT RUN {

# }

Run the code above in your browser using DataLab