Perform k-means clustering on a Spark DataFrame.
ml_kmeans(x, centers, iter.max = 100, features = dplyr::tbl_vars(x),
compute.cost = TRUE, tolerance = 1e-04, ml.options = ml_options(), ...)An object coercable to a Spark DataFrame (typically, a
tbl_spark).
The number of cluster centers to compute.
The maximum number of iterations to use.
The name of features (terms) to use for the model fit.
Whether to compute cost for k-means model using Spark's computeCost.
Param for the convergence tolerance for iterative algorithms.
Optional arguments, used to affect the model generated. See
ml_options for more details.
Optional arguments; currently unused.
ml_model object of class kmeans with overloaded print, fitted and predict functions.
Bahmani et al., Scalable K-Means++, VLDB 2012
For information on how Spark k-means clustering is implemented, please see http://spark.apache.org/docs/latest/mllib-clustering.html#k-means.
Other Spark ML routines: ml_als_factorization,
ml_decision_tree,
ml_generalized_linear_regression,
ml_gradient_boosted_trees,
ml_lda, ml_linear_regression,
ml_logistic_regression,
ml_multilayer_perceptron,
ml_naive_bayes,
ml_one_vs_rest, ml_pca,
ml_random_forest,
ml_survival_regression