DBSTREAM clustering algorithm
Implements a simple density-based stream clustering algorithm that assigns data points to micro-clusters with a given radius and implements shared-density-based reclustering.
DSC_DBSTREAM(r, lambda = 0.001, gaptime = 1000L, Cm = 3, metric = "Euclidean", shared_density = FALSE, alpha=0.1, k=0, minweight = 0) get_shared_density(x, use_alpha = TRUE) change_alpha(x, alpha) get_cluster_assignments(x)
- The radius of micro-clusters.
- The lambda used in the fading function.
- weak micro-clusters (and weak shared density entries)
are removed every
- minimum weight for a micro-cluster.
- metric used to calculate distances.
- Record shared density information. If set to
TRUEthen shared density is used for reclustering, otherwise reachability is used (overlapping clusters with less than $r*(1-alpha)$ distance are clustered together).
- The number of macro clusters to be returned if macro is true.
- For shared density: The minimum proportion of shared points between to clusters to warrant combining them (a suitable value for 2D data is .3). For reachability clustering it is a distance factor.
- The proportion of the total weight a macro-cluster needs to have not to be noise (between 0 and 1).
- A DSC_DBSTREAM object to get the shared density information from.
- only return shared density if it exceeds alpha.
The DBSTREAM algorithm checks for each new data point in the incoming stream, if it is below the threshold value of dissimilarity value of any existing micro-clusters, and if so, merges the point with the micro-cluster. Otherwise, a new micro-cluster is created to accommodate the new data point.
Although DSC_DBSTREAM is a micro clustering algorithm, macro clusters and weights are available.
get_cluster_assignments() can be used to extract the MC assignment for
each data point clustered during the last update operation (note: update needs
to be called with
assignments = TRUE and the block size needs to be large
enough). The function returns the MC index (in the current set of MCs obtained
get_centers()) and as an attribute the permanent MC ids.
plot() for DSC_DBSTREAM has two extra logical parameters called
shared_density which show the assignment area and
the shared density graph, respectively.
An object of class
set.seed(0) stream <- DSD_Gaussians(k = 3, noise = 0.05) # create clusterer with r = 0.05 dbstream <- DSC_DBSTREAM(r = .05) update(dbstream, stream, 1000) dbstream # check micro-clusters nclusters(dbstream) head(get_centers(dbstream)) plot(dbstream, stream) # plot macro-clusters plot(dbstream, stream, type = "both") # plot micro-clusters with assignment area plot(dbstream, stream, type = "both", assignment = TRUE) # DBSTREAM with shared density dbstream <- DSC_DBSTREAM(r = .05, shared_density = TRUE, Cm=5) update(dbstream, stream, 1000) dbstream plot(dbstream, stream, type = "both") # plot the shared density graph (several options) plot(dbstream, stream, type = "both", shared_density = TRUE) plot(dbstream, stream, type = "micro", shared_density = TRUE) plot(dbstream, stream, type = "micro", shared_density = TRUE, assignment = TRUE) plot(dbstream, stream, type = "none", shared_density = TRUE, assignment = TRUE) # see how micro and macro-clusters relate # each microcluster has an entry with the macro-cluster id # Note: unassigned micro-clusters (noise) have an NA microToMacro(dbstream) # do some evaluation evaluate(dbstream, stream, measure="purity") evaluate(dbstream, stream, measure="cRand", type="macro") # use DBSTREAM for conventional clustering (with assignments = TRUE so we can # later retrieve the cluster assignments for each point) data("iris") dbstream <- DSC_DBSTREAM(r = 1) update(dbstream, iris[,-5], assignments = TRUE) dbstream cl <- get_cluster_assignments(dbstream) cl # micro-clusters plot(iris[,-5], col = cl, pch = cl) # macro-clusters plot(iris[,-5], col = microToMacro(dbstream, cl))