One-hot encoding maps a column of label indices to a column of binary
vectors, with at most a single one-value. This encoding allows algorithms
which expect continuous features, such as Logistic Regression, to use
categorical features. Typically, used with ft_string_indexer() to
index a column first.
ft_one_hot_encoder(x, input.col, output.col, drop.last = TRUE, ...)An object (usually a spark_tbl) coercable to a Spark DataFrame.
The name of the input column(s).
The name of the output column.
Boolean; drop the last category?
Optional arguments; currently unused.
See http://spark.apache.org/docs/latest/ml-features.html for more information on the set of transformations available for DataFrame columns in Spark.
Other feature transformation routines: ft_binarizer,
ft_bucketizer,
ft_count_vectorizer,
ft_discrete_cosine_transform,
ft_elementwise_product,
ft_index_to_string,
ft_quantile_discretizer,
ft_regex_tokenizer,
ft_stop_words_remover,
ft_string_indexer,
ft_tokenizer,
ft_vector_assembler,
sdf_mutate