These routines allow you to manage your connections to Spark.
spark_connect(master, spark_home = Sys.getenv("SPARK_HOME"),
method = c("shell", "livy", "databricks", "test", "qubole"),
app_name = "sparklyr", version = NULL, config = spark_config(),
extensions = sparklyr::registered_extensions(), packages = NULL, ...)spark_connection_is_open(sc)
spark_disconnect(sc, ...)
spark_disconnect_all()
spark_submit(master, file, spark_home = Sys.getenv("SPARK_HOME"),
app_name = "sparklyr", version = NULL, config = spark_config(),
extensions = sparklyr::registered_extensions(), ...)
Spark cluster url to connect to. Use "local"
to
connect to a local instance of Spark installed via
spark_install
.
The path to a Spark installation. Defaults to the path
provided by the SPARK_HOME
environment variable. If
SPARK_HOME
is defined, it will always be used unless the
version
parameter is specified to force the use of a locally
installed version.
The method used to connect to Spark. Default connection method
is "shell"
to connect using spark-submit, use "livy"
to
perform remote connections using HTTP, or "databricks"
when using a
Databricks clusters.
The application name to be used while running in the Spark cluster.
The version of Spark to use. Required for "local"
Spark
connections, optional otherwise.
Custom configuration for the generated Spark connection. See
spark_config
for details.
Extension R packages to enable for this connection. By
default, all packages enabled through the use of
sparklyr::register_extension
will be passed here.
A list of Spark packages to load. For example, "delta"
or
"kafka"
to enable Delta Lake or Kafka. Also supports full versions like
"io.delta:delta-core_2.11:0.4.0"
. This is similar to adding packages into the
sparklyr.shell.packages
configuration option. Notice that the version
parameter is used to choose the correect package, otherwise assumes the latest version
is being used.
Optional arguments; currently unused.
A spark_connection
.
Path to R source file to submit for batch execution.
When using method = "livy"
, it is recommended to specify version
parameter to improve performance by using precompiled code rather than uploading
sources. By default, jars are downloaded from GitHub but the path to the correct
sparklyr
JAR can also be specified through the livy.jars
setting.
# NOT RUN {
sc <- spark_connect(master = "spark://HOST:PORT")
connection_is_open(sc)
spark_disconnect(sc)
# }
Run the code above in your browser using DataLab