ft_hashing_tf
Feature Transformation -- HashingTF (Transformer)
Maps a sequence of terms to their term frequencies using the hashing trick.
Usage
ft_hashing_tf(x, input_col = NULL, output_col = NULL, binary = FALSE,
num_features = 2^18, uid = random_string("hashing_tf_"), ...)
Arguments
- x
A
spark_connection
,ml_pipeline
, or atbl_spark
.- input_col
The name of the input column.
- output_col
The name of the output column.
- binary
Binary toggle to control term frequency counts. If true, all non-zero counts are set to 1. This is useful for discrete probabilistic models that model binary events rather than integer counts. (default =
FALSE
)- num_features
Number of features. Should be greater than 0. (default =
2^18
)- uid
A character string used to uniquely identify the feature transformer.
- ...
Optional arguments; currently unused.
Value
The object returned depends on the class of x
.
spark_connection
: Whenx
is aspark_connection
, the function returns aml_transformer
, aml_estimator
, or one of their subclasses. The object contains a pointer to a SparkTransformer
orEstimator
object and can be used to composePipeline
objects.ml_pipeline
: Whenx
is aml_pipeline
, the function returns aml_pipeline
with the transformer or estimator appended to the pipeline.tbl_spark
: Whenx
is atbl_spark
, a transformer is constructed then immediately applied to the inputtbl_spark
, returning atbl_spark
See Also
See http://spark.apache.org/docs/latest/ml-features.html for more information on the set of transformations available for DataFrame columns in Spark.
Other feature transformers: ft_binarizer
,
ft_bucketizer
,
ft_chisq_selector
,
ft_count_vectorizer
, ft_dct
,
ft_elementwise_product
,
ft_feature_hasher
, ft_idf
,
ft_imputer
,
ft_index_to_string
,
ft_interaction
, ft_lsh
,
ft_max_abs_scaler
,
ft_min_max_scaler
, ft_ngram
,
ft_normalizer
,
ft_one_hot_encoder_estimator
,
ft_one_hot_encoder
, ft_pca
,
ft_polynomial_expansion
,
ft_quantile_discretizer
,
ft_r_formula
,
ft_regex_tokenizer
,
ft_sql_transformer
,
ft_standard_scaler
,
ft_stop_words_remover
,
ft_string_indexer
,
ft_tokenizer
,
ft_vector_assembler
,
ft_vector_indexer
,
ft_vector_slicer
, ft_word2vec