doublekm: Double k-means Clustering

Description

Performs simultaneous k-means partitioning on units and variables (rows and columns of the data matrix).

Usage

doublekm(Xs, K, Q, Rndstart, verbose, maxiter, tol, prep, print)

Value

returns a list of estimates and some descriptive quantities of the final results.

U: Units x clusters membership matrix (binary and row-stochastic). Each row is a dummy variable indicating to which unit-cluster each unit has been assigned.
V: Variables x clusters membership matrix (binary and row-stochastic). Each row is a dummy variable indicating to which variable-cluster each variable has been assigned.
centers: K x Q matrix of centers containing the row means expressed in terms of column means.
totss: The total sum of squares (scalar).
withinss: Vector of within-row-cluster sum of squares, one component per cluster.
columnwise_withinss: Vector of within-column-cluster sum of squares, one component per cluster.
betweenss: Amount of deviance captured by the model (scalar).
K-size: Number of units assigned to each row-cluster (vector).
Q-size: Number of variables assigned to each column-cluster (vector).
pseudoF: Calinski-Harabasz index of the resulting (row-) partition (scalar).
loop: The index of the (best) run from which the results have been chosen.
it: the number of iterations performed during the (best) run.

Arguments

Xs: Units x variables numeric data matrix.
K: Number of clusters for the units.
Q: Number of clusters for the variables.
Rndstart: Number of runs to be performed (Defaults is 20).
verbose: Outputs basic summary statistics for each run (1 = enabled; 0 = disabled, default option).
maxiter: Maximum number of iterations allowed (if convergence is not yet reached. Default is 100).
tol: Tolerance threshold. It is the maximum difference between the values of the objective function of two consecutive iterations such that convergence is assumed (default is 1e-6).
prep: Pre-processing of the data. 1 performs the z-score transform (default choice); 2 performs the min-max transform; 0 leaves the data un-pre-processed.
print: Prints summary statistics of the results (1 = enabled; 0 = disabled, default option).

Author

Ionel Prunila, Maurizio Vichi

References

Vichi M. (2001) "Double k-means Clustering for Simultaneous Classification of Objects and Variables" <doi:10.1007/978-3-642-59471-7_6>

Examples

Run this code

# Iris data 
# Loading the numeric variables of iris data
iris <- as.matrix(iris[,-5]) 

# double k-means with 3 unit-clusters and 2 variable-clusters
out <- doublekm(iris, K = 3, Q = 2)

Run the code above in your browser using DataLab