New Cluster
new_cluster(
num_workers,
spark_version,
node_type_id,
driver_node_type_id = NULL,
autoscale = NULL,
cloud_attrs = NULL,
spark_conf = NULL,
spark_env_vars = NULL,
custom_tags = NULL,
ssh_public_keys = NULL,
log_conf = NULL,
init_scripts = NULL,
enable_elastic_disk = TRUE,
driver_instance_pool_id = NULL,
instance_pool_id = NULL
)
Number of worker nodes that this cluster should have. A
cluster has one Spark driver and num_workers
executors for a total of
num_workers
+ 1 Spark nodes.
The runtime version of the cluster. You can retrieve a
list of available runtime versions by using db_cluster_runtime_versions()
.
The node type for the worker nodes.
db_cluster_list_node_types()
can be used to see available node types.
The node type of the Spark driver. This field is
optional; if unset, the driver node type will be set as the same value as
node_type_id
defined above. db_cluster_list_node_types()
can be used to
see available node types.
Instance of cluster_autoscale()
.
Attributes related to clusters running on specific cloud
provider. Defaults to aws_attributes()
. Must be one of aws_attributes()
,
azure_attributes()
, gcp_attributes()
.
Named list. An object containing a set of optional,
user-specified Spark configuration key-value pairs. You can also pass in a
string of extra JVM options to the driver and the executors via
spark.driver.extraJavaOptions
and spark.executor.extraJavaOptions
respectively. E.g. list("spark.speculation" = true, "spark.streaming.ui.retainedBatches" = 5)
.
Named list. User-specified environment variable
key-value pairs. In order to specify an additional set of
SPARK_DAEMON_JAVA_OPTS
, we recommend appending them to
$SPARK_DAEMON_JAVA_OPTS
as shown in the following example. This ensures
that all default Databricks managed environmental variables are included as
well. E.g. {"SPARK_DAEMON_JAVA_OPTS": "$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true"}
Named list. An object containing a set of tags for cluster
resources. Databricks tags all cluster resources with these tags in addition
to default_tags
. Databricks allows at most 45 custom tags.
List. SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified.
Instance of cluster_log_conf()
.
Instance of init_script_info()
.
When enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space.
ID of the instance pool to use for the
driver node. You must also specify instance_pool_id
. Optional.
ID of the instance pool to use for cluster nodes. If
driver_instance_pool_id
is present, instance_pool_id
is used for worker
nodes only. Otherwise, it is used for both the driver and worker nodes.
Optional.
job_task()
Other Task Objects:
email_notifications()
,
libraries()
,
notebook_task()
,
pipeline_task()
,
python_wheel_task()
,
spark_jar_task()
,
spark_python_task()
,
spark_submit_task()