h2o (version 3.2.0.3)

h2o.kmeans: KMeans Model in H2O

Description

Performs k-means clustering on an H2O dataset.

Usage

h2o.kmeans(training_frame, x, k, model_id, max_iterations = 1000,
  standardize = TRUE, init = c("Furthest", "Random", "PlusPlus"), seed,
  nfolds = 0, fold_column = NULL, fold_assignment = c("AUTO", "Random",
  "Modulo"), keep_cross_validation_predictions = FALSE)

Arguments

training_frame
An H2OFrame object containing the variables in the model.
x
(Optional) A vector containing the data columns on which k-means operates.
k
The number of clusters. Must be between 1 and 1e7 inclusive. k may be omitted if the user specifies the initial centers in the init parameter. If k is not omitted, in this case, then it should be equal to the number of user-specified centers.
model_id
(Optional) The unique id assigned to the resulting model. If none is given, an id will automatically be generated.
max_iterations
The maximum number of iterations allowed. Must be between 0
standardize
Logical, indicates whether the data should be standardized before running k-means.
init
A character string that selects the initial set of k cluster centers. Possible values are "Random": for random initialization, "PlusPlus": for k-means plus initialization, or "Furthest": for initialization at the furthest point from each successive center
seed
(Optional) Random seed used to initialize the cluster centroids.
nfolds
(Optional) Number of folds for cross-validation. If nfolds >= 2, then validation must remain empty.
fold_column
(Optional) Column with cross-validation fold index assignment per observation
fold_assignment
Cross-validation fold assignment scheme, if fold_column is not specified Must be "AUTO", "Random" or "Modulo"
keep_cross_validation_predictions
Whether to keep the predictions of the cross-validation models

Value

  • Returns an object of class H2OClusteringModel.

See Also

h2o.cluster_sizes, h2o.totss, h2o.num_iterations, h2o.betweenss, h2o.tot_withinss, h2o.withinss, h2o.centersSTD, h2o.centers

Examples

Run this code
library(h2o)
localH2O <- h2o.init()
prosPath <- system.file("extdata", "prostate.csv", package="h2o")
prostate.hex <- h2o.uploadFile(localH2O, path = prosPath)
h2o.kmeans(training_frame = prostate.hex, k = 10, x = c("AGE", "RACE", "VOL", "GLEASON"))

Run the code above in your browser using DataLab