broom (version 0.4.1)

kmeans_tidiers: Tidying methods for kmeans objects

Description

These methods summarize the results of k-means clustering into three tidy forms. tidy describes the center and size of each cluster, augment adds the cluster assignments to the original data, and glance summarizes the total within and between sum of squares of the clustering.

Usage

"tidy"(x, col.names = paste0("x", 1:ncol(x$centers)), ...)
"augment"(x, data, ...)
"glance"(x, ...)

Arguments

x
kmeans object
col.names
The names to call each dimension of the data in tidy. Defaults to x1, x2...
...
extra arguments, not used
data
Original data (required for augment)

Value

All tidying methods return a data.frame without rownames. The structure depends on the method chosen.tidy returns one row per cluster, with one column for each dimension in the data describing the center, followed by
size
The size of each cluster
withinss
The within-cluster sum of squares
cluster
A factor describing the cluster from 1:k
augment returns the original data with one extra column:
.cluster
The cluster assigned by the k-means algorithm
glance returns a one-row data.frame with the columns
totss
The total sum of squares
tot.withinss
The total within-cluster sum of squares
betweenss
The total between-cluster sum of squares
iter
The numbr of (outer) iterations

See Also

kmeans

Examples

Run this code

library(dplyr)
library(ggplot2)

set.seed(2014)
centers <- data.frame(cluster=factor(1:3), size=c(100, 150, 50),
                      x1=c(5, 0, -3), x2=c(-1, 1, -2))
points <- centers %>% group_by(cluster) %>%
 do(data.frame(x1=rnorm(.$size[1], .$x1[1]),
               x2=rnorm(.$size[1], .$x2[1])))

k <- kmeans(points %>% dplyr::select(x1, x2), 3)
tidy(k)
head(augment(k, points))
glance(k)

ggplot(augment(k, points), aes(x1, x2)) +
    geom_point(aes(color = .cluster)) +
    geom_text(aes(label = cluster), data = tidy(k), size = 10)

Run the code above in your browser using DataLab