cluster (version 2.0.7-1)

xclara: Bivariate Data Set with 3 Clusters

Description

An artificial data set consisting of 3000 points in 3 quite well-separated clusters.

Usage

data(xclara)

Arguments

Format

A data frame with 3000 observations on 2 numeric variables (named V1 and V2) giving the \(x\) and \(y\) coordinates of the points, respectively.

References

Anja Struyf, Mia Hubert & Peter J. Rousseeuw (1996) Clustering in an Object-Oriented Environment. Journal of Statistical Software 1. http://www.jstatsoft.org/v01/i04

Examples

Run this code
# NOT RUN {
## Visualization: Assuming groups are defined as {1:1000}, {1001:2000}, {2001:3000}
plot(xclara, cex = 3/4, col = rep(1:3, each=1000))
p.ID <- c(78, 1411, 2535) ## PAM's medoid indices  == pam(xclara, 3)$id.med
text(xclara[p.ID,], labels = 1:3, cex=2, col=1:3)
# }
# NOT RUN {
<!-- %% TODO: a clara() call with the _identical_ clustering (but faster!) -->
 px <- pam(xclara, 3) ## takes ~2 seconds
 cxcl <- px$clustering ; iCl <- split(seq_along(cxcl), cxcl)
 boxplot(iCl, range = 0.7, horizontal=TRUE,
         main = "Indices of the 3 clusters of  pam(xclara, 3)")

 ## Look more closely now:
 bxCl <- boxplot(iCl, range = 0.7, plot=FALSE)
 ## We see 3 + 2 + 2 = 7  clear "outlier"s  or "wrong group" observations:
 with(bxCl, rbind(out, group))
 ## out   1038 1451 1610   30  327  562  770
 ## group    1    1    1    2    2    3    3
 ## Apart from these, what are the robust ranges of indices? -- Robust range:
 t(iR <- bxCl$stats[c(1,5),])
 ##    1  900
 ##  901 2050
 ## 2051 3000
 gc <- adjustcolor("gray20",1/2)
 abline(v = iR, col = gc, lty=3)
 axis(3, at = c(0, iR[2,]), padj = 1.2, col=gc, col.axis=gc)
# }
# NOT RUN {
<!-- % dont -->
# }

Run the code above in your browser using DataCamp Workspace