Learn R Programming

streamMOA (version 0.1-0)

DSC_CluStream: CluStream Data Stream Clusterer

Description

Class implements the CluStream cluster algorithm for data streams.

Usage

DSC_CluStream(m = 100, horizon = 1000, t = 2, k=NULL)

Arguments

m
Defines the maximum number of micro-clusters used in CluStream
horizon
Defines the time window to be used in CluStream
t
Maximal boundary factor (=Kernel radius factor). When deciding to add a new data point to a micro-cluster, the maximum boundary is defined as a factor of t of the RMS deviation of the data points in the micro-cluster from the centroid.
k
Number of macro-clusters to produce using weighted k-means. NULL disables automatic reclustering.

Value

  • An object of class DSC_CluStream (subclass of DSC_Micro, DSC_MOA and DSC).

Details

This is an interface to the MOA implementation of CluStream.

CluStream applies a weighted k-means algorithm for reclustering (see Examples section below).

References

Aggarwal CC, Han J, Wang J, Yu PS (2003). "A Framework for Clustering Evolving Data Streams." In "Proceedings of the International Conference on Very Large Data Bases (VLDB '03)," pp. 81-92.

Bifet A, Holmes G, Pfahringer B, Kranen P, Kremer H, Jansen T, Seidl T (2010). MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering. In Journal of Machine Learning Research (JMLR).

See Also

DSC, DSC_Micro, DSC_MOA

Examples

Run this code
set.seed(0)
# 3 clusters with 5% noise
dsd <- DSD_Gaussians(k=3)

# cluster with CluStream  
dsc <- DSC_CluStream(m=50)
cluster(dsc, dsd, 500)
dsc

# plot micro-clusters
plot(dsc, dsd)

# reclustering. Use weighted k-means for CluStream
kmeans <- DSC_Kmeans(k=3, weighted=TRUE)
recluster(kmeans, dsc)
plot(kmeans, dsd, type="both")
  
# use k-means automatically
dsc <- DSC_CluStream(m=50, k=3)
cluster(dsc, dsd, 500)
dsc

plot(dsc, dsd, type="both")

Run the code above in your browser using DataLab