spark_pipeline_stage
Create a Pipeline Stage Object
Helper function to create pipeline stage objects with common parameter setters.
- Keywords
- internal
Usage
spark_pipeline_stage(sc, class, uid, features_col = NULL,
label_col = NULL, prediction_col = NULL, probability_col = NULL,
raw_prediction_col = NULL, k = NULL, max_iter = NULL,
seed = NULL, input_col = NULL, input_cols = NULL,
output_col = NULL, output_cols = NULL)
Arguments
- sc
A `spark_connection` object.
- class
Class name for the pipeline stage.
- uid
A character string used to uniquely identify the ML estimator.
- features_col
Features column name, as a length-one character vector. The column should be single vector column of numeric values. Usually this column is output by
ft_r_formula
.- label_col
Label column name. The column should be a numeric column. Usually this column is output by
ft_r_formula
.- prediction_col
Prediction column name.
- probability_col
Column name for predicted class conditional probabilities.
- raw_prediction_col
Raw prediction (a.k.a. confidence) column name.
- k
The number of clusters to create
- max_iter
The maximum number of iterations to use.
- seed
A random seed. Set this value if you need your results to be reproducible across repeated calls.
- input_col
The name of the input column.
- input_cols
Names of input columns.
- output_col
The name of the output column.
- thresholds
Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value
p/t
is predicted, wherep
is the original probability of that class andt
is the class's threshold.- features_col
Features column name, as a length-one character vector. The column should be single vector column of numeric values. Usually this column is output by
ft_r_formula
.- input_cols
Names of output columns.