spark-connections

0th

Percentile

Manage Spark Connections

These routines allow you to manage your connections to Spark.

Usage
spark_connect(master, spark_home = Sys.getenv("SPARK_HOME"),
  method = c("shell", "livy", "databricks", "test", "qubole"),
  app_name = "sparklyr", version = NULL, config = spark_config(),
  extensions = sparklyr::registered_extensions(), ...)

spark_connection_is_open(sc)

spark_disconnect(sc, ...)

spark_disconnect_all()

spark_submit(master, file, spark_home = Sys.getenv("SPARK_HOME"), app_name = "sparklyr", version = NULL, config = spark_config(), extensions = sparklyr::registered_extensions(), ...)

Arguments
master

Spark cluster url to connect to. Use "local" to connect to a local instance of Spark installed via spark_install.

spark_home

The path to a Spark installation. Defaults to the path provided by the SPARK_HOME environment variable. If SPARK_HOME is defined, it will always be used unless the version parameter is specified to force the use of a locally installed version.

method

The method used to connect to Spark. Default connection method is "shell" to connect using spark-submit, use "livy" to perform remote connections using HTTP, or "databricks" when using a Databricks clusters.

app_name

The application name to be used while running in the Spark cluster.

version

The version of Spark to use. Required for "local" Spark connections, optional otherwise.

config

Custom configuration for the generated Spark connection. See spark_config for details.

extensions

Extension packages to enable for this connection. By default, all packages enabled through the use of sparklyr::register_extension will be passed here.

...

Optional arguments; currently unused.

sc

A spark_connection.

file

Path to R source file to submit for batch execution.

Details

When using method = "livy", it is recommended to specify version parameter to improve performance by using precompiled code rather than uploading sources. By default, jars are downloaded from GitHub but the path to the correct sparklyr JAR can also be specified through the livy.jars setting.

Aliases
  • spark-connections
  • spark_connect
  • spark_connection_is_open
  • spark_disconnect
  • spark_disconnect_all
  • spark_submit
Examples
# NOT RUN {
sc <- spark_connect(master = "spark://HOST:PORT")
connection_is_open(sc)

spark_disconnect(sc)

# }
Documentation reproduced from package sparklyr, version 1.0.4, License: Apache License 2.0 | file LICENSE

Community examples

Looks like there are no examples yet.