random_clustering: Randomly cluster a data set into K clusters.
Description
For each observation (row) in 'x', one of K labels is
randomly generated. By default, the probabilities of
selecting each clustering label are equal, but this can
be altered by specifying 'prob', a vector of
probabilities for each cluster.
Usage
random_clustering(x, K, prob = NULL)
Arguments
x
a matrix containing the data to cluster. The
rows are the sample observations, and the columns are the
features.
K
the number of clusters
prob
a vector of probabilities to generate each
cluster label. If NULL, each cluster label has an equal
chance of being selected.
Value
a vector of clustering labels for each observation in
'x'.
Details
Random clustering is often utilized as a baseline
comparison clustering against which other clustering
algorithms are employed to identify structure within the
data. Typically, comparisons are made in terms of
proposed clustering assessment and evaluation methods as
well as clustering similarity measures. For the former, a
specified clustering evaluation method is computed for
the considered clustering algorithms as well as random
clustering. If the clusters determined by a considered
clustering algorithm do not differ significantly from the
random clustering, we might conclude that the found
clusters are no better than naively choosing clustering
labels for each observation at random. Likewise, a
similarity measure can be computed to compare the
clusterings from each of a considered clustering
algorithm and a random clustering: if the clusterings are
significantly similar, once again, we might conclude the
clusters found via the considered clustering algorithm do
not differ significantly from those found at random. In
either case, the clusters are unlikely to provide
meaningful results on which the user can better
understand the inherent structure within the data.