Simulating Data to Study Performance of Clustering Algorithms
Description
MixSim allows simulating mixtures of Gaussian
distributions with different levels of overlap between mixture
components. Pairwise overlap, defined as a sum of two
misclassification probabilities, measures the degree of
interaction between components and can be readily employed to
control the complexity of datasets simulated from mixtures.
These datasets can then be used for systematic performance
investigation of clustering and finite mixture modeling
algorithms. Among other capabilities of MixSim, there are
computing the exact overlap for Gaussian mixtures, simulating
Gaussian and non-Gaussian data, simulating outliers and noise
variables, calculating various measures of agreement between
two partitionings, and constructing parallel distribution plots
for the graphical display of finite mixture models.