streamMOA (version 1.2-1)

DSC_StreamKM_MOA: streamKM++

Description

This is an interface to the MOA implementation of streamKM++.

Usage

DSC_StreamKM(sizeCoreset = 10000, numClusters = 5, length = 100000L)
DSC_StreamKM_MOA(sizeCoreset = 10000, numClusters = 5, length = 100000L)

Arguments

sizeCoreset

Size of the coreset

numClusters

Number of clusters to compute

length

Length of the data stream

Details

streamKM++ uses a tree-based sampling strategy to obtain a small weighted sample of the stream called coreset. Upon reclustering, the algorithm applies the k-means++ algorithm to find a given number of centres in the coreset.

Note: This implementation currently does not return micro-clusters.

References

Marcel R. Ackermann, Christiane Lammersen, Marcus Maertens, Christoph Raupach, Christian Sohler, Kamil Swierkot. "StreamKM++: A Clustering Algorithm for Data Streams." In: Proceedings of the 12th Workshop on Algorithm Engineering and Experiments (ALENEX '10), 2010

Examples

Run this code
# NOT RUN {
# data with 3 clusters
stream <- DSD_Gaussians(k=3, d=2)

# cluster with streamKM++
streamkm <- DSC_StreamKM(sizeCoreset=10000, numClusters=3, length=10000)
update(streamkm, stream, 10000)
streamkm

# plot macro-clusters
plot(streamkm, stream, type="macro")

# }

Run the code above in your browser using DataLab