cluspcamix: Joint dimension reduction and clustering of mixed-type data.

Description

This function implements clustering and dimension reduction for mixed-type variables, i.e., categorical and metric (see, Yamamoto & Hwang, 2014; van de Velden, Iodice D'Enza, & Markos 2019; Vichi, Vicari, & Kiers, 2019). This framework includes Mixed Reduced K-means and Mixed Factorial K-means, as well as a compromise of these two methods. The methods combine Principal Component Analysis of mixed-data for dimension reduction with K-means for clustering.

Usage

cluspcamix(data, nclus, ndim, method=c("mixedRKM", "mixedFKM"), 
center = TRUE, scale = TRUE, alpha=NULL, rotation="none", 
nstart = 100, smartStart=NULL, seed=NULL, binary = FALSE)
# S3 method for cluspcamix
print(x, …)
# S3 method for cluspcamix
summary(object, …)
# S3 method for cluspcamix
fitted(object, mth = c("centers", "classes"), …)

Arguments

data

Dataset with categorical and metric variables

nclus

Number of clusters (nclus = 1 returns the PCAMIX solution)

ndim

Dimensionality of the solution

method

Specifies the method. Options are mixedRKM for mixed reduced K-means and mixedFKM for mixed factorial K-means (default = "mixedRKM")

center

A logical value indicating whether the variables should be shifted to be zero centered (default = TRUE)

scale

A logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place (default = TRUE)

alpha

Adjusts for the relative importance of Mixed RKM and Mixed FKM in the objective function; alpha = 0.5 leads to mixed reduced K-means, alpha = 0 to mixed factorial K-means, and alpha = 1 reduces to the tandem approach (PCAMIX followed by K-means)

rotation

Specifies the method used to rotate the factors. Options are none for no rotation, varimax for varimax rotation with Kaiser normalization and promax for promax rotation (default = "none")

nstart

Number of random starts (default = 100)

smartStart

If NULL then a random cluster membership vector is generated. Alternatively, a cluster membership vector can be provided as a starting solution

seed

An integer that is used as argument by set.seed() for offsetting the random number generator when smartStart = NULL. The default value is NULL.

binary

If TRUE then all categorical variables are 0-1 (dummy) variables.

For the print method, a class of cluspcamix

object

For the summary method, a class of cluspcamix

mth

For the fitted method, a character string that specifies the type of fitted value to return: "centers" for the observations center vector, or "class" for the observations cluster membership value

…

Not used

Value

obscoord

Object scores

attcoord

Variable scores

centroid

Cluster centroids

cluster

Cluster membership

criterion

Optimal value of the objective criterion

size

The number of objects in each cluster

scale

A copy of scale in the return object

center

A copy of center in the return object

nstart

A copy of nstart in the return object

odata

A copy of data in the return object

Details

For the K-means part, the algorithm of Hartigan-Wong is used by default.

The hidden print and summary methods print out some key components of an object of class cluspcamix.

The hidden fitted method returns cluster fitted values. If method is "classes", this is a vector of cluster membership (the cluster component of the "cluspcamix" object). If method is "centers", this is a matrix where each row is the cluster center for the observation. The rownames of the matrix are the cluster membership values.

When nclus = 1 the function returns the solution of PCAMIX and plot(object) shows the corresponding biplot.

References

van de Velden, M., Iodice D'Enza, A., & Markos, A. (2019). Distance-based clustering of mixed data. Wiley Interdisciplinary Reviews: Computational Statistics, e1456.

Vichi, M., Vicari, D., & Kiers, H.A.L. (2019). Clustering and dimension reduction for mixed variables. Behaviormetrika. doi:10.1007/s41237-018-0068-6.

Yamamoto, M., & Hwang, H. (2014). A general formulation of cluster analysis with dimension reduction and subspace separation. Behaviormetrika, 41, 115-129.

Examples

Run this code

# NOT RUN {
data(diamond)
#Mixed Reduced K-means solution with 3 clusters in 2 dimensions 
#after 10 random starts
outmixedRKM = cluspcamix(diamond, 3, 2, method = "mixedRKM", nstart = 10, seed = 1234)
outmixedRKM 
#Scatterplot (dimensions 1 and 2)
plot(outmixedRKM)

#Tandem analysis: PCAMIX followed by K-means solution 
#with 3 clusters in 2 dimensions after 10 random starts 
outTandem = cluspcamix(diamond, 3, 2, alpha = 1, nstart = 10, seed = 1234)
outTandem
#Scatterplot (dimensions 1 and 2)
plot(outTandem)

#nclus = 1 just gives the PCAMIX solution
#outPCAMIX = cluspcamix(diamond, 1, 2)
#outPCAMIX
#Biplot (dimensions 1 and 2) 
#plot(outPCAMIX)
# }

Run the code above in your browser using DataLab