ft_bucketizer

A <code>spark_connection</code>, <code>ml_pipeline</code>, or a <code>tbl_spark</code>.

input_col

output_col

A numeric vector of cutpoints, indicating the bucket boundaries.

splits

input_cols

output_cols

Parameter for specifying multiple splits parameters. Each
element in this array can be used to map continuous features into buckets.

splits_array

(Spark 2.1.0+) Param for how to handle invalid entries. Options are
'skip' (filter out rows with invalid values), 'error' (throw an error), or
'keep' (keep invalid values in a special additional bucket). Default: "error"

handle_invalid

A character string used to uniquely identify the feature transformer.

Optional arguments; currently unused.

Similar to R's <code><a rd-options="" href="/link/cut?package=sparklyr&version=0.9.4" data-mini-rdoc="sparklyr::cut">cut</a></code> function, this transforms a numeric column
into a discretized column, with breaks specified through the <code>splits</code>
parameter.

R interface to Apache Spark, a fast and general engine for big data
processing, see <http://spark.apache.org>. This package supports connecting to
local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end,
and provides an interface to Spark's built-in machine learning algorithms.

Javier Luraschi

sparklyr

R Interface to Apache Spark

Kevin Kuo

Kevin Ushey

JJ Allaire

Samuel Macedo

 RStudio

 The Apache Software Foundation

ft_bucketizer function

Similar to R's <code><a rd-options='' href='cut'>cut</a></code> function, this transforms a numeric column
into a discretized column, with breaks specified through the <code>splits</code>
parameter.

ft_bucketizer: Feature Transformation -- Bucketizer (Transformer)

Description

Usage

Arguments

Value

See Also

Examples