Usage
## Default method:
h2o.kmeans(data, centers, cols = "", iter.max = 10, normalize = FALSE,
init = "none", seed = 0, dropNACols, version = 2)
## Import to a ValueArray object:
h2o.kmeans.VA(data, centers, cols = "", iter.max = 10, normalize = FALSE,
init = "none", seed = 0)
## Import to a FluidVecs object:
h2o.kmeans.FV(data, centers, cols = "", iter.max = 10, normalize = FALSE,
init = "none", seed = 0, dropNACols = FALSE)
Arguments
data
An H2OParsedDataVA
(version = 1
) or H2OParsedData
(version = 2
) object containing the variables in the model.
centers
The number of clusters k.
cols
(Optional) A vector containing the names of the data columns on which k-means runs. If blank, k-means clustering will be run on the entire data set.
iter.max
(Optional) The maximum number of iterations allowed.
normalize
(Optional) A logical value indicating whether the data should be normalized before running k-means.
init
(Optional) Method by which to select the k initial cluster centroids. Possible values are "none"
for random initialization, "plusplus"
for k-means++ initialization, and "furthest"
for initialization at the furthest p
seed
(Optional) Random seed used to initialize the cluster centroids.
dropNACols
(Optional) A logical value indicating whether to drop columns with more than 10% entries that are NA.
version
(Optional) The version of k-means clustering to run. If version = 1
, this will run the more stable ValueArray implementation, while version = 2
selects the faster, but still beta stage FluidVecs implementation.