kmeans
function. It creates
several partitions forming a cascade from a small to a large number of
groups.cascadeKM(data, inf.gr, sup.gr, iter = 100, criterion = "calinski")
cIndexKM (y, x, index = "all")
## S3 method for class 'cascadeKM':
plot(x, min.g, max.g, grpmts.plot = TRUE,
sortg = FALSE, gridcol = NA, ...)
"kmeans"
returned by a clustering algorithm
such as kmeans
plot
"calinski"
and "ssi"
.
Type "all"
to obtain both indices.
Abbreviations of these names are also accepted.TRUE
or FALSE
).x
, although not in the graph. INA
,
which is the default value, removes the grid lines.cascadeKM
returns an object of class
cascadeKM
with items:inf.gr
to $K$ = sup.gr
.cIndex
returns a vector with the index values. The
maximum value of these indices
is supposed to indicate the best partition. These indices work best with
groups of equal sizes. When the groups are not of equal sizes, one should
not put too much faith in the maximum of these indices, and also explore the
groups corresponding to other values of $K$.kmeans
. The most
of the work is performed by function cIndex
s based on the clustIndex function.
Some of the criteria were removed from this version because computation
errors were generated when only one object was found in a group.
The default value is "calinski",
which refers to the well-known Calinski-Harabasz (1974) criterion.
The other available index is the simple structure index "ssi".
In the case of groups of equal sizes, "calinski" is generally a good
criterion to indicate the correct number of groups. Users should not
take its indications literally when the groups are not equal in size.
Type "all" to obtain both indices. The indices are defined as:
[object Object] Weingessel, A., Dimitriadou, A. and Dolnicar, S.
An Examination Of Indexes For Determining The Number
Of Clusters In Binary Data Sets,
kmeans
# Partitioning a (10 x 10) data matrix of random numbers
mat <- matrix(runif(100),10,10)
res <- cascadeKM(mat, 2, 5, iter = 25, criterion = 'calinski')
toto <- plot(res)
# Partitioning an autocorrelated time series
vec <- sort(matrix(runif(30),30,1))
res <- cascadeKM(vec, 2, 5, iter = 25, criterion = 'calinski')
toto <- plot(res)
# Partitioning a large autocorrelated time series
# Note that we remove the grid lines
vec <- sort(matrix(runif(1000),1000,1))
res <- cascadeKM(vec, 2, 7, iter = 10, criterion = 'calinski')
toto <- plot(res, gridcol=NA)
Run the code above in your browser using DataLab