Usage
h2o.kmeans(training_frame, x, k, model_id, max_iterations = 1000,
standardize = TRUE, init = c("Furthest", "Random", "PlusPlus"), seed,
nfolds = 0, fold_column = NULL, fold_assignment = c("AUTO", "Random",
"Modulo"), keep_cross_validation_predictions = FALSE)
Arguments
training_frame
An H2O Frame object containing the
variables in the model.
x
(Optional) A vector containing the data columns on
which k-means operates.
k
The number of clusters. Must be between 1 and
1e7 inclusive. k may be omitted if the user specifies the
initial centers in the init parameter. If k is not omitted,
in this case, then it should be equal to the number of
user-specified centers.
model_id
(Optional) The unique id assigned to the resulting model. If
none is given, an id will automatically be generated.
max_iterations
The maximum number of iterations allowed. Must be between 0
standardize
Logical, indicates whether the data should be
standardized before running k-means.
init
A character string that selects the initial set of k cluster
centers. Possible values are "Random": for random initialization,
"PlusPlus": for k-means plus initialization, or "Furthest": for
initialization at the furthest point from each successive center
seed
(Optional) Random seed used to initialize the cluster centroids.
nfolds
(Optional) Number of folds for cross-validation. If nfolds >= 2
, then validation
must remain empty.
fold_column
(Optional) Column with cross-validation fold index assignment per observation
fold_assignment
Cross-validation fold assignment scheme, if fold_column is not specified
Must be "AUTO", "Random" or "Modulo"
keep_cross_validation_predictions
Whether to keep the predictions of the cross-validation models