ft_vector_indexer

A <code>spark_connection</code>, <code>ml_pipeline</code>, or a <code>tbl_spark</code>.

input_col

output_col

Threshold for the number of values a categorical feature can take. If a feature is found to have &gt; <code>max_categories</code> values, then it is declared continuous. Must be greater than or equal to 2. Defaults to 20.

max_categories

(Optional) A <code>tbl_spark</code>. If provided, eagerly fit the (estimator)
feature "transformer" against <code>dataset</code>. See details.

dataset

A character string used to uniquely identify the feature transformer.

Optional arguments; currently unused.

Indexing categorical feature columns in a dataset of Vector.

R interface to Apache Spark, a fast and general engine for big data
processing, see <http://spark.apache.org>. This package supports connecting to
local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end,
and provides an interface to Spark's built-in machine learning algorithms.

ft_vector_indexer: Feature Tranformation -- VectorIndexer (Estimator)

Description

Usage

Arguments

Value

Details

See Also