h2o (version 2.8.4.4)

h2o.kmeans: H2O: K-Means Clustering

Description

Performs k-means clustering on a data set.

Usage

h2o.kmeans(data, centers, cols = "", key = "", iter.max = 10, 
  normalize = FALSE, init = "none", seed = 0, dropNACols = FALSE)

Arguments

data
An H2OParsedData object containing the variables in the model.
centers
The number of clusters k.
cols
(Optional) A vector containing the names of the data columns on which k-means runs. If blank, k-means clustering will be run on the entire data set.
key
(Optional) The unique hex key assigned to the resulting model. If none is given, a key will automatically be generated.
iter.max
(Optional) The maximum number of iterations allowed.
normalize
(Optional) A logical value indicating whether the data should be normalized before running k-means.
init
(Optional) Method by which to select the k initial cluster centroids. Possible values are "none" for random initialization, "plusplus" for k-means++ initialization, and "furthest" for initialization at the furthest p
seed
(Optional) Random seed used to initialize the cluster centroids.
dropNACols
(Optional) A logical value indicating whether to drop columns with more than 10% entries that are NA.

Value

  • An object of class H2OKMeansModel with slots key, data, and model, where the last is a list of the following components:
  • centersA matrix of cluster centers.
  • clusterA H2OParsedData object containing the vector of integers (from 1 to k), which indicate the cluster to which each point is allocated.
  • sizeThe number of points in each cluster.
  • withinssVector of within-cluster sum of squares, with one component per cluster.
  • tot.withinssTotal within-cluster sum of squares, i.e., sum(withinss).

See Also

h2o.importFile, h2o.importFolder, h2o.importHDFS, h2o.importURL, h2o.uploadFile

Examples

Run this code
library(h2o)
localH2O = h2o.init()
prosPath = system.file("extdata", "prostate.csv", package = "h2o")
prostate.hex = h2o.importFile(localH2O, path = prosPath)
h2o.kmeans(data = prostate.hex, centers = 10, cols = c("AGE", "RACE", "VOL", "GLEASON"))

Run the code above in your browser using DataLab