sparklyr (version 0.2.30)

ml_kmeans: Spark ML -- K-Means Clustering

Description

Perform k-means clustering on a spark_tbl.

Usage

ml_kmeans(x, centers, max.iter = 100, features = dplyr::tbl_vars(x), ...)

Arguments

x
An object coercable to a Spark DataFrame (typically, a tbl_spark).
centers
The number of cluster centers to compute.
max.iter
The maximum number of iterations to use.
features
The name of features (terms) to use for the model fit.
...
Optional arguments; currently unused.

See Also

For information on how Spark k-means clustering is implemented, please see http://spark.apache.org/docs/latest/mllib-clustering.html#k-means.

Other Spark ML routines: ml_decision_tree, ml_generalized_linear_regression, ml_gradient_boosted_trees, ml_lda, ml_linear_regression, ml_logistic_regression, ml_multilayer_perceptron, ml_naive_bayes, ml_one_vs_rest, ml_pca, ml_random_forest, ml_survival_regression