This function facilitates the selection of the appropriate number of clusters and dimensions for joint dimension reduction and clustering methods.
tuneclus(data, nclusrange = 3:4, ndimrange = 2:3,
method = c("RKM","FKM","clusCA","iFCB","MCAk"),
criterion = "asw", dst = "full", alpha = NULL, alphak = NULL,
center = TRUE, scale = TRUE, rotation = "none", nstart = 100,
smartStart = NULL, seed = 1234)# S3 method for tuneclus
print(x, …)
# S3 method for tuneclus
summary(object, …)
# S3 method for tuneclus
fitted(object, mth = c("centers", "classes"), …)
Continuous or Categorical dataset
An integer vector with the range of numbers of clusters which are to be compared by the cluster validity criteria. Note: the number of clusters should be greater than one
An integer vector with the range of dimensions which are to be compared by the cluster validity criteria
Specifies the method. Options are RKM
for reduced K-means, FKM
for factorial K-means, MCAk
for MCA K-means, iFCB
for Iterative Factorial Clustering of Binary variables and clusCA
for Cluster Correspondence Analysis
One of asw
, ch
or crit
. Determines whether average silhouette width, Calinski-Harabasz index or objective value of the selected method is used (default = "asw")
Specifies the data used to compute the distances between objects. Options are full
for the original data (after possible scaling) and low
for the object scores in the low-dimensional space (default = "full")
Adjusts for the relative importance of RKM and FKM in the objective function; alpha = 1
reduces to PCA, alpha = 0.5
to reduced K-means, and alpha = 0
to factorial K-means
Non-negative scalar to adjust for the relative importance of MCA (alphak = 1
) and K-means (alphak = 0
) in the solution (default = .5). Works only in combination with method = "MCAk"
A logical value indicating whether the variables should be shifted to be zero centered (default = TRUE)
A logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place (default = TRUE)
Specifies the method used to rotate the factors. Options are none for no rotation, varimax for varimax rotaion with Kaiser normalization and promax
for promax rotation (default = "none")
Number of starts (default = 100)
If NULL
then a random cluster membership vector is generated. Alternatively, a cluster membership vector can be provided as a starting solution
An integer that is used as argument by set.seed()
for offsetting the random number generator when smartStart = NULL. The default value is 1234
For the print
method, a class of clusmca
For the summary
method, a class of clusmca
For the fitted
method, a character string that specifies the type of fitted value to return: "centers"
for the observations center vector, or "class"
for the observations cluster membership value
Not used
The output of the optimal run of cluspca()
or clusmca()
The optimal number of clusters
The optimal number of dimensions
The optimal criterion value for nclusbest
clusters and ndimbest
dimensions
Matrix of size nclusrange x ndimrange
with the criterion values for the specified ranges of clusters and dimensions (values are calculated only when the number of clusters is greater than the number of dimensions; otherwise values in the grid are left blank)
For the K-means part, the algorithm of Hartigan-Wong is used by default.
The hidden print
and summary
methods print out some key components of an object of class tuneclus
.
The hidden fitted
method returns cluster fitted values. If method is "classes"
, this is a vector of cluster membership (the cluster component of the "tuneclus" object). If method is "centers"
, this is a matrix where each row is the cluster center for the observation. The rownames of the matrix are the cluster membership values.
Calinski, R.B., and Harabasz, J., (1974). A dendrite method for cluster analysis. Communications in Statistics, 3, 1-27.
Kaufman, L., and Rousseeuw, P.J., (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.
# NOT RUN {
# Reduced K-means for a range of clusters and dimensions
data(macro)
# Cluster quality assessment based on the average silhouette width # in the low dimensional space
bestRKM = tuneclus(macro, 3:4, 2:3, method = "RKM", criterion = "asw", dst = "low", nstart = 10)
bestRKM
plot(bestRKM)
# Cluster Correspondence Analysis for a range of clusters and dimensions
data(underwear)
# Cluster quality assessment based on the average silhouette width # in the full dimensional space
bestclusCA = tuneclus(underwear, 3:4, 2:3, method = "clusCA", criterion = "asw", nstart = 10)
bestclusCA
plot(bestclusCA)
# }
Run the code above in your browser using DataLab