Reachability Micro-Cluster Reclusterer
Implementation of reachability clustering (based on DBSCAN's concept of reachability) to recluster a set of micro-clusters. Two micro-clusters are directly reachable if they are within each other's epsilon-neighborhood (i.e., the distance between the centers is less then epsilon). Two micro-clusters are reachable if they are connected by a chain of pairwise directly reachable micro-clusters. All mutually reachable micro-clusters are put in the same cluster.
DSC_Reachability(epsilon, min_weight=NULL, description=NULL)
- radius of the epsilon-neighborhood.
- micro-clusters with a weight less than this will be ignored for reclustering.
- optional character string to describe the clustering method.
Reachability uses internally
DSC_Hierarchical with single link.
Note that this clustering cannot be updated iteratively and every time it is used for (re)clustering, the old clustering is deleted.
An object of class
DSC_Reachability. The object contains the following items:
Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Evangelos Simoudis, Jiawei Han, Usama M. Fayyad. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. pp. 226-231.
stream <- DSD_mlbenchGenerator("cassini") # Recluster micro-clusters from DSC_Sample with reachability sample <- DSC_Sample(k = 200) update(sample, stream, 1000) reach <- DSC_Reachability(epsilon=0.3) recluster(reach, sample) plot(reach, stream, type="both") # For comparison we using reachability clustering directly on data points # Note: reachability is not a data stream clustering algorithm taking O(n^2) # time and space. reach <- DSC_Reachability(epsilon=0.2) update(reach, stream, 500) reach plot(reach, stream)