sparklyr (version 0.5.6)

ft_tokenizer: Feature Tranformation -- Tokenizer

Description

A tokenizer that converts the input string to lowercase and then splits it by white spaces.

Usage

ft_tokenizer(x, input.col = NULL, output.col = NULL, ...)

Arguments

x

An object (usually a spark_tbl) coercable to a Spark DataFrame.

input.col

The name of the input column(s).

output.col

The name of the output column.

...

Optional arguments; currently unused.

See Also

See http://spark.apache.org/docs/latest/ml-features.html for more information on the set of transformations available for DataFrame columns in Spark.

Other feature transformation routines: ft_binarizer, ft_bucketizer, ft_discrete_cosine_transform, ft_elementwise_product, ft_index_to_string, ft_one_hot_encoder, ft_quantile_discretizer, ft_regex_tokenizer, ft_sql_transformer, ft_string_indexer, ft_vector_assembler, sdf_mutate