Evaluator for clustering results. The metric computes the Silhouette measure using the squared Euclidean distance. The Silhouette is a measure for the validation of the consistency within clusters. It ranges between 1 and -1, where a value close to 1 means that the points in a cluster are close to the other points in the same cluster and far from the points of the other clusters.
ml_clustering_evaluator(x, features_col = "features",
prediction_col = "prediction", metric_name = "silhouette",
uid = random_string("clustering_evaluator_"), ...)
A spark_connection
object or a tbl_spark
containing label and prediction columns. The latter should be the output of sdf_predict
.
Name of features column.
Name of the prediction column.
The performance metric. Currently supports "silhouette".
A character string used to uniquely identify the ML estimator.
Optional arguments; currently unused.
The calculated performance metric