
Last chance! 50% off unlimited learning
Sale ends in
Stream clustering algorithm based on evolutionary optimization.
The online component uses a simplified version of DBSTREAM
to generate micro-clusters.
The micro-clusters are then incrementally reclustered using an evloutionary algorithm.
Evolutionary algorithms create slight variations by combining and randomly modifying existing solutions.
By iteratively selecting better solutions, an evolutionary pressure is created which improves the clustering over time.
Since the evolutionary algorithm is incremental, it is possible to apply it between observations, e.g. in the idle time of the stream.
Whenever there is idle time, we can call the recluster
function of the reference class to improve the macro-clusters (see example).
The evolutionary algorithm can also be applied as a traditional reclustering step, or a combination of both.
In addition, this implementation also allows to evaluate a fixed number of generations after each observation.
DSC_evoStream(r, lambda = 0.001, tgap = 100, k = 2,
crossoverRate = 0.8, mutationRate = 0.001, populationSize = 100,
initializeAfter = 2 * k, incrementalGenerations = 1,
reclusterGenerations = 1000)
radius threshold for micro-cluster assignment
decay rate
time-interval between outlier detection and clean-up
number of macro-clusters
cross-over rate for the evolutionary algorithm
mutation rate for the evolutionary algorithm
number of solutions that the evolutionary algorithm maintains
number of micro-cluster required for the initialization of the evolutionary algorithm.
number of EA generations performed after each observation
number of EA generations performed during reclustering
Carnein M. and Trautmann H. (2018), "evoStream - Evolutionary Stream Clustering Utilizing Idle Times", Big Data Research.
# NOT RUN {
stream <- DSD_Memory(DSD_Gaussians(k = 3, d = 2), 500)
## init evoStream
evoStream <- DSC_evoStream(r = 0.05, k = 3,
incrementalGenerations = 1, reclusterGenerations = 500)
## insert observations
update(evoStream, stream, n = 500)
## micro clusters
get_centers(evoStream, type = "micro")
## micro weights
get_weights(evoStream, type = "micro")
## macro clusters
get_centers(evoStream, type = "macro")
## macro weights
get_weights(evoStream, type = "macro")
## plot result
reset_stream(stream)
plot(evoStream, stream, type = "both")
## if we have time, evaluate additional generations.
## This can be called at any time, also between observations.
## by default, 1 generation is evaluated after each observation and
## 1000 generations during reclustering but we set it here to 500
evoStream$RObj$recluster(500)
## plot improved result
reset_stream(stream)
plot(evoStream, stream, type = "both")
## get assignment of micro to macro clusters
microToMacro(evoStream)
# }
Run the code above in your browser using DataLab