Learn R Programming

diffusionMap (version 0.0-2)

diffusionKmeans: Diffusion K-means

Description

Clusters a data set based on its diffusion coordinates.

Usage

diffusionKmeans(dmap, K, params = c(), Niter = 50, epsilon = 0.001)

Arguments

dmap
a '"dmap"' object, computed by diffusion()
K
number of clusters
params
optional parameters for each data point. Entry can be a vector of length n, or a matrix with n rows. If this argument is given, cluster centroid parameters are returned.
Niter
number of K-means iterations performed.
epsilon
stopping criterion for relative change in distortion for each K-means iteration

Value

  • The returned value is a list with components
  • partfinal labelling of data from K-means. n-dimensional vector with integers between 1 and K
  • centK geometric centroids found by K-means
  • Dminimum of total distortion (loss function of K-means) found across K-means runs
  • DKn by k matrix of squared (Euclidean) distances from each point to every centroid for the optimal K-means run
  • centparamsoptional parameters for each centroid. Only returned if params is specified in the function call. Is a matrix with k rows.

Details

A '"dmap"' object computed by diffuse() is the input, so diffuse() must be performed first. Function is written this way so the K-means parameters may be varied without having to recompute the diffusion map coordinates in each run.

References

Lafon, S., & Lee, A., (2006), IEEE Trans. Pattern Anal. and Mach. Intel., 28, 1393

Richards, J. W., Freeman, P. E., Lee, A. B., Schafer, C. M., (2009), ApJ, 691, 32

See Also

diffuse,distortionMin

Examples

Run this code
## example with annulus data set
data(annulus)
par(mfrow=c(2,1))
plot(annulus,main="Annulus Data",pch=20,cex=.7)
D = dist(annulus) # use Euclidean distance
dmap = diffuse(D,0.03) # compute diffusion map
k=2  # number of clusters
dkmeans = diffusionKmeans(dmap, k,Niter=25)
plot(annulus,main="Colored by diffusion K-means clustering",pch=20,
   cex=.7,col=dkmeans$part)


## example with Chainlink data set
data(Chainlink)
lab.col = c(rep("red",500),rep("blue",500)); n=1000
scatterplot3d(Chainlink$C1,Chainlink$C2,Chainlink$C3,color=lab.col,
   main="Chainlink Data") # plot Chainlink data
D = dist(Chainlink) # use Euclidean distance
dmap = diffuse(D,neigen=3,,eps.val=.01) # compute diffusion map & plot
plot(dmap)
print(dmap)
dkmeans = diffusionKmeans(dmap, K=2, Niter=25)
col.dkmeans=ifelse(dkmeans$part==1,"red","blue")
scatterplot3d(Chainlink$C1,Chainlink$C2,Chainlink$C3,color=col.dkmeans,
   main="Chainlink Data, colored by diffusion K-means classification")

Run the code above in your browser using DataLab