sdf_persist

0th

Percentile

Persist a Spark DataFrame

Persist a Spark DataFrame, forcing any pending computations and (optionally) serializing the results to disk.

Usage
sdf_persist(x, storage.level = "MEMORY_AND_DISK")
Arguments
x

A spark_connection, ml_pipeline, or a tbl_spark.

storage.level

The storage level to be used. Please view the Spark Documentation for information on what storage levels are accepted.

Details

Spark DataFrames invoke their operations lazily -- pending operations are deferred until their results are actually needed. Persisting a Spark DataFrame effectively 'forces' any pending computations, and then persists the generated Spark DataFrame as requested (to memory, to disk, or otherwise).

Users of Spark should be careful to persist the results of any computations which are non-deterministic -- otherwise, one might see that the values within a column seem to 'change' as new operations are performed on that data set.

Aliases
  • sdf_persist
Documentation reproduced from package sparklyr, version 0.8.1-9001, License: Apache License 2.0 | file LICENSE

Community examples

Looks like there are no examples yet.