spark_apply

An object (usually a <code>spark_tbl</code>) coercable to a Spark DataFrame.

A function that transforms a data frame partition into a data frame.
The function <code>f</code> has signature <code>f(df, group1, group2, ...)</code> where
<code>df</code> is a data frame with the data to be processed and <code>group1</code> to
<code>groupN</code> contain the values of the <code>group_by</code> values. When
<code>group_by</code> is not specified, <code>f</code> takes only one argument.

A vector of column names or a named vector of column types for
the transformed object. Defaults to the names from the original object and
adds indexed column names when not enough columns are specified.

columns

Boolean; should the table be cached into memory?

memory

Column name used to group by data frame partitions.

group_by

Boolean; distribute <code>.libPaths()</code> packages to nodes?

packages

Optional arguments; currently unused.

Applies an R function to a Spark object (typically, a Spark DataFrame).

R interface to Apache Spark, a fast and general engine for big data
processing, see <http://spark.apache.org>. This package supports connecting to
local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end,
and provides an interface to Spark's built-in machine learning algorithms.

spark_apply: Apply an R Function in Spark

Description

Usage

Arguments