speccl: A spectral clustering algorithm

Description

A spectral clustering algorithm. Cluster analysis is performed by embedding the data into the subspace of the eigenvectors of an affinity matrix

Usage

speccl(data,nc,distance="GDM1",sigma="automatic",sigma.interval="default",
mod.sample=0.75,R=10,iterations=3,na.action=na.omit,...)

Arguments

data

matrix or dataset

the number of clusters

distance

distance function used to calculate affinity matrix: "sEuclidean" - squared Euclidean distance, "euclidean" - Euclidean distance, "manhattan" - city block distance, "maximum" - Chebyshev distance, "canberra" - Lance and Williams Canberra distance, "BC" - Bray-Curtis distance measure for ratio data, "GDM1" - GDM distance for metric data, "GDM2" - GDM distance for ordinal data, "SM" - Sokal-Michener distance measure for nominal variables

sigma

scale parameter used to calculate affinity matrix: sigma="automatic" - an algorithm for searching optimal value of sigma parameter; sigma=200 - value of sigma parameter given by researcher, e.g. 200

sigma.interval

sigma.interval="default" - from zero to square root of sum of all distances in lower triangle of distance matrix for "sEuclidean" and from zero to sum of all distances in lower triangle of distance matrix for other distances; sigma.interval=1000 - from zero to value given by researcher, e.g. 1000

mod.sample

proportion of data to use when estimating sigma (default: 0.75)

the number of intervals examined in each step of searching optimal value of sigma parameter algorithm

(See ../doc/speccl_details.pdf)

iterations

the maximum number of iterations (rounds) allowed in algorithm of searching optimal value of sigma parameter

na.action

the action to perform on NA

...

arguments passed to kmeans procedure

Value

scdist

returns the lower triangle of the distance matrix

clusters

a vector of integers indicating the cluster to which each object is allocated

size

the number of objects in each cluster

withinss

the within-cluster sum of squared distances for each cluster

Ematrix

data matrix n x u (n - the number of objects, u - the number of eigenvectors)

Ymatrix

normalized data matrix n x u (n - the number of objects, u - the number of eigenvectors)

sigma

the value of scale parameter given by searching algorithm

Details

See file ../doc/speccl_details.pdf for further details

References

Karatzoglou, A. (2006), Kernel methods. Software, algorithms and applications, Dissertation, Wien, Technical University.

Ng, A., Jordan, M., Weiss, Y. (2002), On spectral clustering: analysis and an algorithm, In: T. Dietterich, S. Becker, Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems 14. MIT Press, 849-856. Available at:

https://papers.nips.cc/paper/2092-on-spectral-clustering-analysis-and-an-algorithm.pdf.

Walesiak, M. (2011), Uog<U+00F3>lniona miara odleg<U+0142>o<U+015B>ci GDM w statystycznej analizie wielowymiarowej z wykorzystaniem programu R [The Generalized Distance Measure GDM in multivariate statistical analysis with R], Wydawnictwo Uniwersytetu Ekonomicznego, Wroclaw. Available at: http://keii.ue.wroc.pl/pracownicy/mw/2011_Walesiak_Uogolniona_miara_odleglosci_GDM_w_SAW_z_wykorzystaniem_programu_R_errata.pdf.

Walesiak, M. (2012), Klasyfikacja spektralna a skale pomiaru zmiennych [Spectral clustering and measurement scales of variables], Przeglad Statystyczny (Statistical Review), no. 1, 13-31. Available at: http://keii.ue.wroc.pl/pracownicy/mw/2012_Walesiak_Przeglad_Statystyczny_z_1.pdf.

Walesiak, M. (2016), Uog<U+00F3>lniona miara odleg<U+0142>o<U+015B>ci GDM w statystycznej analizie wielowymiarowej z wykorzystaniem programu R. Wydanie 2 poprawione i rozszerzone [The Generalized Distance Measure GDM in multivariate statistical analysis with R], Wydawnictwo Uniwersytetu Ekonomicznego, Wroclaw. Available at: http://keii.ue.wroc.pl/pracownicy/mw/2016_Walesiak_Uogolniona_miara_odleglosci_GDM.pdf.

Examples

Run this code

# NOT RUN {
# Commented due to long execution time
# Example 1
#library(clusterSim)
#library(mlbench)
#data<-mlbench.spirals(100,1,0.03)
#plot(data)
#x<-data$x
#res1<-speccl(x,nc=2,distance="GDM1",sigma="automatic",
#sigma.interval="default",mod.sample=0.75,R=10,iterations=3)
#clas1<-res1$cluster
#print(data$classes)
#print(clas1)
#cRand<-classAgreement(table(as.numeric(as.vector(data$classes)),
#res1$clusters))$crand
#print(res1$sigma)
#print(cRand)

# Example 2
#library(clusterSim)
#grnd2<-cluster.Gen(50,model=4,dataType="m",numNoisyVar=1)
#data<-as.matrix(grnd2$data)
#colornames<-c("red","blue","green")
#grnd2$clusters[grnd2$clusters==0]<-length(colornames)
#plot(grnd2$data,col=colornames[grnd2$clusters])
#us<-nrow(data)*nrow(data)/2
#res2<-speccl(data,nc=3,distance="sEuclidean",sigma="automatic",
#sigma.interval=us,mod.sample=0.75,R=10,iterations=3)
#cRand<-comparing.Partitions(grnd2$clusters,res2$clusters,type="crand")
#print(res2$sigma)
#print(cRand)

# Example 3
#library(clusterSim)
#grnd3<-cluster.Gen(40,model=4,dataType="o",numCategories=7)
#data<-as.matrix(grnd3$data)
#plotCategorial(grnd3$data,pairsofVar=NULL,cl=grnd3$clusters,
#clColors=c("red","blue","green"))
#res3<-speccl(data,nc=3,distance="GDM2",sigma="automatic",
#sigma.interval="default",mod.sample=0.75,R=10,iterations=3)
#cRand<-comparing.Partitions(grnd3$clusters,res3$clusters,type="crand")
#print(res3$sigma)
#print(cRand)

# Example 4
library(clusterSim)
data(data_nominal)
res4<-speccl(data_nominal,nc=4,distance="SM",sigma="automatic",
sigma.interval="default",mod.sample=0.75,R=10,iterations=3)
print(res4)

# }

Run the code above in your browser using DataLab