sparkbq v0.1.1
Monthly downloads
Google 'BigQuery' Support for 'sparklyr'
A 'sparklyr' extension package providing an integration with Google 'BigQuery'.
It supports direct import/export where records are directly streamed from/to 'BigQuery'.
In addition, data may be imported/exported via intermediate data extracts on Google 'Cloud Storage'.
Readme

sparkbq: Google BigQuery Support for sparklyr
sparkbq is a sparklyr extension package providing an integration with Google BigQuery. It builds on top of spark-bigquery, which provides a Google BigQuery data source to Apache Spark.
Version Information
You can install the released version of sparkbq from CRAN via
install.packages("sparkbq")
or the latest development version through
devtools::install_github("miraisolutions/sparkbq", ref = "develop")
The following table provides an overview over supported versions of Apache Spark, Scala, and Google Dataproc:
| sparkbq | spark-bigquery | Apache Spark | Scala | Google Dataproc |
|---|---|---|---|---|
| 0.1.x | 0.1.0 | 2.2.x and 2.3.x | 2.11 | 1.2.x and 1.3.x |
sparkbq is based on the Spark package spark-bigquery which is available in a separate GitHub repository.
Example Usage
library(sparklyr)
library(sparkbq)
library(dplyr)
config <- spark_config()
sc <- spark_connect(master = "local[*]", config = config)
# Set Google BigQuery default settings
bigquery_defaults(
billingProjectId = "<your_billing_project_id>",
gcsBucket = "<your_gcs_bucket>",
datasetLocation = "US",
serviceAccountKeyFile = "<your_service_account_key_file>",
type = "direct"
)
# Reading the public shakespeare data table
# https://cloud.google.com/bigquery/public-data/
# https://cloud.google.com/bigquery/sample-tables
hamlet <-
spark_read_bigquery(
sc,
name = "hamlet",
projectId = "bigquery-public-data",
datasetId = "samples",
tableId = "shakespeare") %>%
filter(corpus == "hamlet") # NOTE: predicate pushdown to BigQuery!
# Retrieve results into a local tibble
hamlet %>% collect()
# Write result into "mysamples" dataset in our BigQuery (billing) project
spark_write_bigquery(
hamlet,
datasetId = "mysamples",
tableId = "hamlet",
mode = "overwrite")
Authentication
When running outside of Google Cloud it is necessary to specify a service account JSON key file. The service account key file can be passed as parameter serviceAccountKeyFile to bigquery_defaults or directly to spark_read_bigquery and spark_write_bigquery.
Alternatively, an environment variable export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/service_account_keyfile.json can be set (see https://cloud.google.com/docs/authentication/getting-started for more information). Make sure the variable is set before starting the R session.
When running on Google Cloud, e.g. Google Cloud Dataproc, application default credentials (ADC) may be used in which case it is not necessary to specify a service account key file.
Further Information
Functions in sparkbq
| Name | Description | |
| default_billing_project_id | Default Google BigQuery Billing Project ID | |
| default_dataset_location | Default Google BigQuery Dataset Location | |
| default_service_account_key_file | Default Google BigQuery Service Account Key File | |
| default_bigquery_type | Default BigQuery import/export type | |
| spark_write_bigquery | Writing data to Google BigQuery | |
| bigquery_defaults | Google BigQuery Default Settings | |
| spark_read_bigquery | Reading data from Google BigQuery | |
| default_gcs_bucket | Default Google BigQuery GCS Bucket | |
| No Results! | ||
Last month downloads
Details
| Type | Package |
| URL | http://www.mirai-solutions.com, https://github.com/miraisolutions/sparkbq |
| BugReports | https://github.com/miraisolutions/sparkbq/issues |
| License | GPL-3 | file LICENSE |
| SystemRequirements | Spark (>= 2.2.x) |
| Encoding | UTF-8 |
| LazyData | yes |
| RoxygenNote | 6.1.1 |
| NeedsCompilation | no |
| Packaged | 2019-12-18 17:03:34 UTC; simon |
| Repository | CRAN |
| Date/Publication | 2019-12-18 18:00:02 UTC |
| suggests | dplyr |
| depends | R (>= 3.3.2) |
| imports | sparklyr (>= 0.7.0) |
| Contributors | Mirai Solutions GmbH, Nicola Lambiase, Omer Demirel |
Include our badge in your README
[](http://www.rdocumentation.org/packages/sparkbq)