Ckmeans.1d.dp-package: Optimal `k`-Means Clustering in One-dimension by Dynamic Programming

Description

The Ckmeans.1d.dp algorithm clusters 1-D data given by a numeric vector $x$ into $k$ groups by dynamic programming (Wang and Song, 2011). It guarantees the optimality of clustering -- the sum of within-cluster sum of squares is always the minimum. In contrast, heuristic $k$-means algorithms may be non-optimal or inconsistent from run to run.

Since version 3.4.0, the runtime of this dynamic programming algorithm has been greatly reduced after two changes to trim down the search space. These improvements, not discussed in (Wang and Song, 2011), will be described in future publications. While runtime was at least two times faster for all tested examples, the most notable improvement occurred when $k=2$ or $k$ is very large. First, an upper bound for the sum of within cluster square distances is checked to reduce the dynamic programming search space. Second, the unnecessary calculation of $n-1$ elements in the dynamic programming matrix that do not influence the final result has been eliminated. The asymptotic runtime on $n$ points is improved to $O( (k-1) n^2 + n \lg n)$. When $k=2$, it is $O(n\lg n)$, leading to enormous reduction in runtime: assigning one million points into two clusters took half a second, down from about a week, on iMac with a 2.93 GHz Intel Core i7 processor. Overall, the function now runs fast for all tested input, and the improvements are especially pronounced for $k=2$ and large $k$. The space complexity is $O(kn)$.

Richard Bellman (1973) first described a general dynamic programming strategy for solving univariate clustering problems with additive optimality measures. The strategy, however, did not address any specific characteristics of the $k$-means problem and its implied general algorithm will have a time complexity of $O(kn^3)$ and space complexity of $O(kn)$.

Arguments

Details

ll{ Package: Ckmeans.1d.dp Type: Package Version: 3.4.0-1 Initial version: 1.0 Initial date: 2010-10-26 License: LGPL (>= 3) }

References

Wang, H. and Song, M. (2011) Ckmeans.1d.dp: optimal k-means clustering in one dimension by dynamic programming. The R Journal 3(2), 29--33. Retrieved from http://journal.r-project.org/archive/2011-2/RJournal_2011-2_Wang+Song.pdf

Bellman, R. (1973) A note on cluster analysis and dynamic programming. Mathematical Biosciences 18(3), 311--312.

Description

Arguments

Details

References

See Also