stream (version 1.3-0)

DSC_DBSTREAM: DBSTREAM clustering algorithm

Description

Implements a simple density-based stream clustering algorithm that assigns data points to micro-clusters with a given radius and implements shared-density-based reclustering.

Usage

DSC_DBSTREAM(r, lambda = 0.001, gaptime = 1000L,
  Cm = 3, metric = "Euclidean", shared_density = FALSE,
  alpha=0.1, k=0, minweight = 0)
get_shared_density(x, use_alpha = TRUE)
change_alpha(x, alpha)
get_cluster_assignments(x)

Arguments

r

The radius of micro-clusters.

lambda

The lambda used in the fading function.

gaptime

weak micro-clusters (and weak shared density entries) are removed every gaptime points.

Cm

minimum weight for a micro-cluster.

metric

metric used to calculate distances.

shared_density

Record shared density information. If set to TRUE then shared density is used for reclustering, otherwise reachability is used (overlapping clusters with less than \(r*(1-alpha)\) distance are clustered together).

k

The number of macro clusters to be returned if macro is true.

alpha

For shared density: The minimum proportion of shared points between to clusters to warrant combining them (a suitable value for 2D data is .3). For reachability clustering it is a distance factor.

minweight

The proportion of the total weight a macro-cluster needs to have not to be noise (between 0 and 1).

x

A DSC_DBSTREAM object to get the shared density information from.

use_alpha

only return shared density if it exceeds alpha.

Value

An object of class DSC_DBSTREAM (subclass of DSC, DSC_R, DSC_Micro).

Details

The DBSTREAM algorithm checks for each new data point in the incoming stream, if it is below the threshold value of dissimilarity value of any existing micro-clusters, and if so, merges the point with the micro-cluster. Otherwise, a new micro-cluster is created to accommodate the new data point.

Although DSC_DBSTREAM is a micro clustering algorithm, macro clusters and weights are available.

get_cluster_assignments() can be used to extract the MC assignment for each data point clustered during the last update operation (note: update needs to be called with assignments = TRUE and the block size needs to be large enough). The function returns the MC index (in the current set of MCs obtained with, e.g., get_centers()) and as an attribute the permanent MC ids.

plot() for DSC_DBSTREAM has two extra logical parameters called assignment and shared_density which show the assignment area and the shared density graph, respectively.

References

Michael Hahsler and Matthew Bolanos. Clustering data streams based on shared density between micro-clusters. IEEE Transactions on Knowledge and Data Engineering, 28(6):1449--1461, June 2016

See Also

DSC, DSC_Micro

Examples

Run this code
# NOT RUN {
set.seed(0)
stream <- DSD_Gaussians(k = 3, noise = 0.05)

# create clusterer with r = 0.05
dbstream <- DSC_DBSTREAM(r = .05)
update(dbstream, stream, 1000)
dbstream

# check micro-clusters
nclusters(dbstream)
head(get_centers(dbstream))
plot(dbstream, stream)

# plot macro-clusters
plot(dbstream, stream, type = "both")

# plot micro-clusters with assignment area
plot(dbstream, stream, type = "both", assignment = TRUE)


# DBSTREAM with shared density
dbstream <- DSC_DBSTREAM(r = .05, shared_density = TRUE, Cm=5)
update(dbstream, stream, 1000)
dbstream
plot(dbstream, stream, type = "both")
# plot the shared density graph (several options)
plot(dbstream, stream, type = "both", shared_density = TRUE)
plot(dbstream, stream, type = "micro", shared_density = TRUE)
plot(dbstream, stream, type = "micro", shared_density = TRUE, assignment = TRUE)
plot(dbstream, stream, type = "none", shared_density = TRUE, assignment = TRUE)

# see how micro and macro-clusters relate
# each microcluster has an entry with the macro-cluster id
# Note: unassigned micro-clusters (noise) have an NA
microToMacro(dbstream)

# do some evaluation
evaluate(dbstream, stream, measure="purity")
evaluate(dbstream, stream, measure="cRand", type="macro")

# use DBSTREAM for conventional clustering (with assignments = TRUE so we can
# later retrieve the cluster assignments for each point)
data("iris")
dbstream <- DSC_DBSTREAM(r = 1)
update(dbstream, iris[,-5], assignments = TRUE)
dbstream

cl <- get_cluster_assignments(dbstream)
cl

# micro-clusters
plot(iris[,-5], col = cl, pch = cl)

# macro-clusters
plot(iris[,-5], col = microToMacro(dbstream, cl))
# }

Run the code above in your browser using DataCamp Workspace