streamMOA (version 1.2-1)

DSC_BICO_MOA: BICO - Fast computation of k-means coresets in a data stream

Description

This is an interface to the MOA implementation of BICO. The original BICO implementation by Fichtenberger et al is also available as DSC_BICO.

Usage

DSC_BICO_MOA(Cluster = 5, Dimensions, MaxClusterFeatures = 1000,
  Projections = 10, k = NULL, space = NULL, p = NULL)

Arguments

Cluster, k

Number of desired centers

Dimensions

The number of the dimensions of the input points (stream) need to be specified in advance

MaxClusterFeatures, space

Maximum size of the coreset

Projections, p

Number of random projections used for the nearest neighbour search

Details

BICO maintains a tree which is inspired by the clustering tree of BIRCH, a SIGMOD Test of Time award-winning clustering algorithm. Each node in the tree represents a subset of these points. Instead of storing all points as individual objects, only the number of points, the sum and the squared sum of the subset's points are stored as key features of each subset. Points are inserted into exactly one node.

References

Hendrik Fichtenberger, Marc Gille, Melanie Schmidt, Chris Schwiegelshohn, Christian Sohler: BICO: BIRCH Meets Coresets for k-Means Clustering. ESA 2013: 481-492

Examples

Run this code
# NOT RUN {
# data with 3 clusters and 2 dimensions
stream <- DSD_Gaussians(k=3, d=2)

# cluster with BICO
bico <- DSC_BICO_MOA(Cluster=3, Dimensions=2)
update(bico, stream, 10000)
bico

# plot micro and macro-clusters
plot(bico, stream, type="both")

# }

Run the code above in your browser using DataCamp Workspace