spark-connections
Manage Spark Connections
These routines allow you to manage your connections to Spark.
Usage
spark_connect(
master,
spark_home = Sys.getenv("SPARK_HOME"),
method = c("shell", "livy", "databricks", "test", "qubole"),
app_name = "sparklyr",
version = NULL,
config = spark_config(),
extensions = sparklyr::registered_extensions(),
packages = NULL,
scala_version = NULL,
...
)spark_connection_is_open(sc)
spark_disconnect(sc, ...)
spark_disconnect_all()
spark_submit(
master,
file,
spark_home = Sys.getenv("SPARK_HOME"),
app_name = "sparklyr",
version = NULL,
config = spark_config(),
extensions = sparklyr::registered_extensions(),
scala_version = NULL,
...
)
Arguments
- master
Spark cluster url to connect to. Use
"local"
to connect to a local instance of Spark installed viaspark_install
.- spark_home
The path to a Spark installation. Defaults to the path provided by the
SPARK_HOME
environment variable. IfSPARK_HOME
is defined, it will always be used unless theversion
parameter is specified to force the use of a locally installed version.- method
The method used to connect to Spark. Default connection method is
"shell"
to connect using spark-submit, use"livy"
to perform remote connections using HTTP, or"databricks"
when using a Databricks clusters.- app_name
The application name to be used while running in the Spark cluster.
- version
The version of Spark to use. Required for
"local"
Spark connections, optional otherwise.- config
Custom configuration for the generated Spark connection. See
spark_config
for details.- extensions
Extension R packages to enable for this connection. By default, all packages enabled through the use of
sparklyr::register_extension
will be passed here.- packages
A list of Spark packages to load. For example,
"delta"
or"kafka"
to enable Delta Lake or Kafka. Also supports full versions like"io.delta:delta-core_2.11:0.4.0"
. This is similar to adding packages into thesparklyr.shell.packages
configuration option. Notice that theversion
parameter is used to choose the correct package, otherwise assumes the latest version is being used.- scala_version
Load the sparklyr jar file that is built with the version of Scala specified (this currently only makes sense for Spark 2.4, where sparklyr will by default assume Spark 2.4 on current host is built with Scala 2.11, and therefore `scala_version = '2.12'` is needed if sparklyr is connecting to Spark 2.4 built with Scala 2.12)
- ...
Optional arguments; currently unused.
- sc
A
spark_connection
.- file
Path to R source file to submit for batch execution.
Details
When using method = "livy"
, jars are downloaded from GitHub but the path
to a local sparklyr
JAR can also be specified through the livy.jars
setting.
Examples
# NOT RUN {
sc <- spark_connect(master = "spark://HOST:PORT")
connection_is_open(sc)
spark_disconnect(sc)
# }