stream (version 1.2-3)

DSC_Reachability: Reachability Micro-Cluster Reclusterer

Description

Implementation of reachability clustering (based on DBSCAN's concept of reachability) to recluster a set of micro-clusters. Two micro-clusters are directly reachable if they are within each other's epsilon-neighborhood (i.e., the distance between the centers is less then epsilon). Two micro-clusters are reachable if they are connected by a chain of pairwise directly reachable micro-clusters. All mutually reachable micro-clusters are put in the same cluster.

Usage

DSC_Reachability(epsilon, min_weight=NULL, description=NULL)

Arguments

epsilon
radius of the epsilon-neighborhood.
min_weight
micro-clusters with a weight less than this will be ignored for reclustering.
description
optional character string to describe the clustering method.

Value

An object of class DSC_Reachability. The object contains the following items:

Details

Reachability uses internally DSC_Hierarchical with single link.

Note that this clustering cannot be updated iteratively and every time it is used for (re)clustering, the old clustering is deleted.

References

Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Evangelos Simoudis, Jiawei Han, Usama M. Fayyad. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. pp. 226-231.

See Also

DSC, DSC_Macro

Examples

Run this code
stream <- DSD_mlbenchGenerator("cassini")

# Recluster micro-clusters from DSC_Sample with reachability
sample <- DSC_Sample(k = 200)
update(sample, stream, 1000)

reach <- DSC_Reachability(epsilon=0.3)
recluster(reach, sample)
  
plot(reach, stream, type="both")  

# For comparison we using reachability clustering directly on data points
# Note: reachability is not a data stream clustering algorithm taking O(n^2) 
# time and space.
reach <- DSC_Reachability(epsilon=0.2)
update(reach, stream, 500)
reach
plot(reach, stream)

Run the code above in your browser using DataLab