sdf_quantile

Given a numeric column within a Spark DataFrame, compute
approximate quantiles.

R interface to Apache Spark, a fast and general
engine for big data processing, see <https://spark.apache.org/>. This
package supports connecting to local and remote Apache Spark clusters,
provides a 'dplyr' compatible back-end, and provides an interface to
Spark's built-in machine learning algorithms.

Edgar Ruiz

sparklyr

R Interface to Apache Spark

Javier Luraschi

Kevin Kuo

Kevin Ushey

JJ Allaire

Samuel Macedo

Hossein Falaki

Lu Wang

Andy Zhang

Yitao Li

Jozef Hajnala

Maciej Szymkiewicz

Wil Davis

 RStudio

 The Apache Software Foundation

sdf_quantile function

<dl><dt>x</dt>
<dd>A <code>spark_connection</code>, <code>ml_pipeline</code>, or a <code>tbl_spark</code>.</dd>
<dt>column</dt>
<dd>The column(s) for which quantiles should be computed.
Multiple columns are only supported in Spark 2.0+.</dd>
<dt>probabilities</dt>
<dd>A numeric vector of probabilities, for
which quantiles should be computed.</dd>
<dt>relative.error</dt>
<dd>The maximal possible difference between the actual
percentile of a result and its expected percentile (e.g., if
`relative.error` is 0.01 and `probabilities` is 0.95, then any value
between the 94th and 96th percentile will be considered an acceptable
approximation).</dd>
<dt>weight.column</dt>
<dd>If not NULL, then a generalized version of the Greenwald-
Khanna algorithm will be run to compute weighted percentiles, with each
sample from `column` having a relative weight specified by the corresponding
value in `weight.column`. The weights can be considered as relative
frequencies of sample data points.</dd></dl>

Arguments

Compute (Approximate) Quantiles with a Spark DataFrame — sdf_quantile

<dl>

<dt>x</dt>
<dd>A <code>spark_connection</code>, <code>ml_pipeline</code>, or a <code>tbl_spark</code>.</dd>


<dt>column</dt>
<dd>The column(s) for which quantiles should be computed.
Multiple columns are only supported in Spark 2.0+.</dd>


<dt>probabilities</dt>
<dd>A numeric vector of probabilities, for
which quantiles should be computed.</dd>


<dt>relative.error</dt>
<dd>The maximal possible difference between the actual
percentile of a result and its expected percentile (e.g., if
`relative.error` is 0.01 and `probabilities` is 0.95, then any value
between the 94th and 96th percentile will be considered an acceptable
approximation).</dd>


<dt>weight.column</dt>
<dd>If not NULL, then a generalized version of the Greenwald-
Khanna algorithm will be run to compute weighted percentiles, with each
sample from `column` having a relative weight specified by the corresponding
value in `weight.column`. The weights can be considered as relative
frequencies of sample data points.</dd>

</dl>

Compute (Approximate) Quantiles with a Spark DataFrame

sdf_quantile: Compute (Approximate) Quantiles with a Spark DataFrame

Description

Usage

Arguments