Bind multiple Spark DataFrames by row and column

sdf_bind_rows() and sdf_bind_cols() are implementation of the common pattern of, sdfs) or, sdfs) for binding many Spark DataFrames into one.

sdf_bind_rows(..., id = NULL)



Spark tbls to combine.

Each argument can either be a Spark DataFrame or a list of Spark DataFrames

When row-binding, columns are matched by name, and any missing columns with be filled with NA.

When column-binding, rows are matched by position, so all data frames must have the same number of rows.


Data frame identifier.

When id is supplied, a new column of identifiers is created to link each row to its original Spark DataFrame. The labels are taken from the named arguments to sdf_bind_rows(). When a list of Spark DataFrames is supplied, the labels are taken from the names of the list. If no names are found a numeric sequence is used instead.


The output of sdf_bind_rows() will contain a column if that column appears in any of the inputs.


sdf_bind_rows() and sdf_bind_cols() return tbl_spark

  • sdf_bind
  • sdf_bind_rows
  • sdf_bind_cols
Documentation reproduced from package sparklyr, version 1.5.1, License: Apache License 2.0 | file LICENSE

Community examples

Looks like there are no examples yet.