Learn R Programming

h2o (version 2.4.3.11)

h2o.kmeans: H2O: K-Means Clustering

Description

Performs k-means clustering on a data set.

Usage

## Default method:
h2o.kmeans(data, centers, cols = "", iter.max = 10, normalize = FALSE, 
  init = "none", seed = 0, dropNACols, version = 2)

## Import to a ValueArray object:
h2o.kmeans.VA(data, centers, cols = "", iter.max = 10, normalize = FALSE, 
  init = "none", seed = 0)

## Import to a FluidVecs object:
h2o.kmeans.FV(data, centers, cols = "", iter.max = 10, normalize = FALSE, 
  init = "none", seed = 0, dropNACols = FALSE)

Arguments

data
An H2OParsedDataVA (version = 1) or H2OParsedData (version = 2) object containing the variables in the model.
centers
The number of clusters k.
cols
(Optional) A vector containing the names of the data columns on which k-means runs. If blank, k-means clustering will be run on the entire data set.
iter.max
(Optional) The maximum number of iterations allowed.
normalize
(Optional) A logical value indicating whether the data should be normalized before running k-means.
init
(Optional) Method by which to select the k initial cluster centroids. Possible values are "none" for random initialization, "plusplus" for k-means++ initialization, and "furthest" for initialization at the furthest p
seed
(Optional) Random seed used to initialize the cluster centroids.
dropNACols
(Optional) A logical value indicating whether to drop columns with more than 10% entries that are NA.
version
(Optional) The version of k-means clustering to run. If version = 1, this will run the more stable ValueArray implementation, while version = 2 selects the faster, but still beta stage FluidVecs implementation.

Value

  • An object of class H2OKMeansModelVA (version = 1) or H2OKMeansModel (version = 2) with slots key, data, and model, where the last is a list of the following components:
  • centersA matrix of cluster centers.
  • clusterA H2OParsedDataVA (version = 1) or H2OParsedData (version = 2) object containing the vector of integers (from 1 to k), which indicate the cluster to which each point is allocated.
  • sizeThe number of points in each cluster.
  • withinssVector of within-cluster sum of squares, with one component per cluster.
  • tot.withinssTotal within-cluster sum of squares, i.e., sum(withinss).

Details

IMPORTANT: Currently, to run k-means with version = 1, you must import data to a ValueArray object using h2o.importFile.VA, h2o.importFolder.VA or one of its variants. To run with version = 2, you must import data to a FluidVecs object using h2o.importFile.FV, h2o.importFolder.FV or one of its variants.

See Also

h2o.importFile, h2o.importFolder, h2o.importHDFS, h2o.importURL, h2o.uploadFile

Examples

Run this code
library(h2o)
localH2O = h2o.init(ip = "localhost", port = 54321, startH2O = TRUE)
prosPath = system.file("extdata", "prostate.csv", package = "h2o")
prostate.hex = h2o.importFile(localH2O, path = prosPath)
h2o.kmeans(data = prostate.hex, centers = 10, cols = c("AGE", "RACE", "VOL", "GLEASON"))

Run the code above in your browser using DataLab