
Run a custom R function on Spark worker to write a Spark DataFrame into file(s). If Spark's speculative execution feature is enabled (i.e., `spark.speculation` is true), then each write task may be executed more than once and the user-defined writer function will need to ensure no concurrent writes happen to the same file path (e.g., by appending UUID to each file name).
spark_write(x, writer, paths, packages = NULL)
A Spark Dataframe to be saved into file(s)
A writer function with the signature function(partition, path)
where partition
is a R dataframe containing all rows from one partition
of the original Spark Dataframe x
and path is a string specifying the
file to write partition
to
A single destination path or a list of destination paths, each one
specifying a location for a partition from x
to be written to. If
number of partition(s) in x
is not equal to length(paths)
then
x
will be re-partitioned to contain length(paths)
partition(s)
Boolean to distribute .libPaths()
packages to each node,
a list of packages to distribute, or a package bundle created with
# NOT RUN {
library(sparklyr)
sc <- spark_connect(master = "local[3]")
# copy some test data into a Spark Dataframe
sdf <- sdf_copy_to(sc, iris, overwrite = TRUE)
# create a writer function
writer <- function(df, path) {
write.csv(df, path)
}
spark_write(
sdf,
writer,
# re-partition sdf into 3 partitions and write them to 3 separate files
paths = list("file:///tmp/file1", "file:///tmp/file2", "file:///tmp/file3"),
)
spark_write(
sdf,
writer,
# save all rows into a single file
paths = list("file:///tmp/all_rows")
)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab