Learn R Programming

subspaceMOA (version 0.6.0)

DSC_HDDStream: Density-based Projected Clustering over High-Dimensional Data

Description

This function creates a DSC object that represents an instance of the HDDStream algorithm and can be used for stream clustering.

Usage

DSC_HDDStream(epsilonN = 0.1, beta = 0.5, mu = 10, lambda = 0.5,
  initPoints = 2000, pi = 30, kappa = 10, delta = 0.001, offline = 2,
  speed = 100)

Arguments

epsilonN
radius of each neighborhood
beta
control the effect of mu
mu
minimum number of points desired to be in a microcluster
lambda
decaying parameter
initPoints
number of points to use for initialization
pi
number of maximal subspace dimensionality
kappa
parameter to define preference weighted vector
delta
defines the threshold for the variance
offline
offline multiplier for epsilon
speed
number of incoming points per time unit

Details

HDDStream is an algorithm for the density-based projected clustering of high-dimensional data streams. The algorithm is initialized by buffering the first initPoints points that arrive and then applying the PreDeCon algorithm over these points. Then, Microclusters are maintained online by adding each new point to its closest core Microcluster iff doing so does not increase the projected radius of this microcluster beyond epsilonN. If a point can not be added to a core microcluster, an attempt will be made to add it to an outlier microcluster, with the same criterion as for core microclusters. If these attempts both fail, the point will start its own microcluster. Microclusters are aged according to the decaying parameter lambda. Macroclustering is performed on-demand, using the PreDeCon algorithm.

Examples

Run this code
dsc <- DSC_HDDStream()
dsd <- DSD_RandomRBFSubspaceGeneratorEvents()
update(dsc,dsd,1000)

Run the code above in your browser using DataLab