sdf_sample

An object coercable to a Spark DataFrame.

fraction

replacement

seed


Draw a random sample of rows (with or without replacement)
from a Spark DataFrame.


Provision, connect and interface to Apache Spark from within R.
This package supports connecting to local and remote Apache Spark clusters,
provides a dplyr-compatible back-end, and provides an interface to Spark's
built-in machine learning algorithms.

Javier Luraschi

sparklyr

R Interface to Apache Spark

sdf_sample function

 The family of functions prefixed with <code>sdf_</code> generally access the Scala
Spark DataFrame API directly, as opposed to the <code>dplyr</code> interface which
uses Spark SQL. These functions will 'force' any pending SQL in a
<code>dplyr</code> pipeline, such that the resulting <code>tbl_spark</code> object
returned will no longer have the attached 'lazy' SQL operations. Note that
the underlying Spark DataFrame <em>does</em> execute its operations lazily, so
that even though the pending set of operations (currently) are not exposed at
the <span style="R">R</span> level, these operations will only be executed when you explicitly
<code>collect()</code> the table.

sdf_sample: Randomly Sample Rows from a Spark DataFrame

Description

Usage

Arguments

Transforming Spark DataFrames

See Also