sdf_separate_column
Separate a Vector Column into Scalar Columns
Given a vector column in a Spark DataFrame, split that
into n
separate columns, each column made up of
the different elements in the column column
.
Usage
sdf_separate_column(x, column, into = NULL)
Arguments
- x
A
spark_connection
,ml_pipeline
, or atbl_spark
.- column
The name of a (vector-typed) column.
- into
A specification of the columns that should be generated from
column
. This can either be a vector of column names, or an R list mapping column names to the (1-based) index at which a particular vector element should be extracted.
Community examples
This is generally used in combination with ft_regex_tokenizer, to split a column containing comma separated values (or other patterns) into multipl columns. ``` mydf %>% ft_regex_tokenizer(input.col="mycolumn", output.col="mycolumnSplit", pattern=";") %>% sdf_separate_column("mycolumnSplit", into=c("column1", "column2") ```