sdf_expand_grid

Each input variable can be either a R vector/factor or a Spark
dataframe. Unnamed inputs will assume the default names of 'Var1', 'Var2',
etc in the result, similar to what `expand.grid` does for unnamed inputs.

Indicates which input(s) should be broadcasted to all
nodes of the Spark cluster during the join process (default: none).

broadcast_vars

Boolean; whether the resulting Spark dataframe should be
cached into memory (default: TRUE)

memory

Number of partitions the resulting Spark dataframe should
have

repartition

Vector of column names used for partitioning the
resulting Spark dataframe, only supported for Spark 2.0+

partition_by

Given one or more R vectors/factors or single-column Spark dataframes,
perform an expand.grid operation on all of them and store the result in
a Spark dataframe

R interface to Apache Spark, a fast and general engine for big data
processing, see <http://spark.apache.org>. This package supports connecting to
local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end,
and provides an interface to Spark's built-in machine learning algorithms.

Yitao Li

sparklyr

R Interface to Apache Spark

Javier Luraschi

sdf_expand_grid: Create a Spark dataframe containing all combinations of inputs

Description

Usage

Arguments

Examples