h2o.kmeans: H2O: K-Means Clustering

Description

Performs k-means clustering on a data set.

Usage

## Default method:
h2o.kmeans(data, centers, cols = "", iter.max = 10, normalize = FALSE, 
  init = "none", seed = 0, dropNACols, version = 2)

## Import to a ValueArray object:
h2o.kmeans.VA(data, centers, cols = "", iter.max = 10, normalize = FALSE, 
  init = "none", seed = 0)

## Import to a FluidVecs object:
h2o.kmeans.FV(data, centers, cols = "", iter.max = 10, normalize = FALSE, 
  init = "none", seed = 0, dropNACols = FALSE)

Arguments

data

An H2OParsedDataVA (version = 1) or H2OParsedData (version = 2) object containing the variables in the model.

centers

The number of clusters k.

cols

(Optional) A vector containing the names of the data columns on which k-means runs. If blank, k-means clustering will be run on the entire data set.

iter.max

(Optional) The maximum number of iterations allowed.

normalize

(Optional) A logical value indicating whether the data should be normalized before running k-means.

init

(Optional) Method by which to select the k initial cluster centroids. Possible values are "none" for random initialization, "plusplus" for k-means++ initialization, and "furthest" for initialization at the furthest p

seed

(Optional) Random seed used to initialize the cluster centroids.

dropNACols

(Optional) A logical value indicating whether to drop columns with more than 10% entries that are NA.

version

(Optional) The version of k-means clustering to run. If version = 1, this will run the more stable ValueArray implementation, while version = 2 selects the faster, but still beta stage FluidVecs implementation.

Value

An object of class H2OKMeansModelVA (version = 1) or H2OKMeansModel (version = 2) with slots key, data, and model, where the last is a list of the following components:
centersA matrix of cluster centers.
clusterA H2OParsedDataVA (version = 1) or H2OParsedData (version = 2) object containing the vector of integers (from 1 to k), which indicate the cluster to which each point is allocated.
sizeThe number of points in each cluster.
withinssVector of within-cluster sum of squares, with one component per cluster.
tot.withinssTotal within-cluster sum of squares, i.e., sum(withinss).

Details

IMPORTANT: Currently, to run k-means with version = 1, you must import data to a ValueArray object using h2o.importFile.VA, h2o.importFolder.VA or one of its variants. To run with version = 2, you must import data to a FluidVecs object using h2o.importFile.FV, h2o.importFolder.FV or one of its variants.

Examples

Run this code

library(h2o)
localH2O = h2o.init(ip = "localhost", port = 54321, startH2O = TRUE)
prosPath = system.file("extdata", "prostate.csv", package = "h2o")
prostate.hex = h2o.importFile(localH2O, path = prosPath)
h2o.kmeans(data = prostate.hex, centers = 10, cols = c("AGE", "RACE", "VOL", "GLEASON"))

Run the code above in your browser using DataLab