Learn R Programming

trimcluster (version 0.1-1)

trimkmeans: Trimmed k-means clustering

Description

The trimmed k-means clustering method by Cuesta-Albertos, Gordaliza and Matran (1997). This optimizes the k-means criterion under trimming a portion of the points.

Usage

trimkmeans(data,k,trim=0.1, scaling=FALSE, runs=100, points=NULL,
                       countmode=runs+1, printcrit=FALSE,
                       maxit=2*nrow(as.matrix(data)))

## S3 method for class 'tkm': print(x, ...) ## S3 method for class 'tkm': plot(x, data, ...)

Arguments

data
matrix or data.frame with raw data
k
integer. Number of clusters.
trim
numeric between 0 and 1. Proportion of points to be trimmed.
scaling
logical. If TRUE, the variables are centered at their means and scaled to unit variance before execution.
runs
integer. Number of algorithm runs from initial means (randomly chosen from the data points).
points
NULL or a matrix with k vectors used as means to initialize the algorithm. If initial mean vectors are specified, runs should be 1 (otherwise the same initial means are used for all runs).
countmode
optional positive integer. Every countmode algorithm runs trimkmeans shows a message.
printcrit
logical. If TRUE, all criterion values (mean squares) of the algorithm runs are printed.
maxit
integer. Maximum number of iterations within an algorithm run. Each iteration determines all points which are closer to a different cluster center than the one to which they are currently assigned. The algorithm terminates if no more points ha
x
object of class tkm.
...
further arguments to be transferred to plot or plotcluster.

Value

  • An object of class 'tkm' which is a LIST with components
  • classificationinteger vector coding cluster membership with trimmed observations coded as k+1.
  • meansnumerical matrix giving the mean vectors of the k classes.
  • disttomvector of squared Euclidean distances of all points to the closest mean.
  • roptmaximum value of disttom so that the corresponding point is not trimmed.
  • ksee above.
  • trimsee above.
  • runssee above.
  • scalingsee above.

Details

plot.tkm calls plotcluster if the dimensionality of the data p is 1, shows a scatterplot with non-trimmed regions if p=2 and discriminant coordinates computed from the clusters (ignoring the trimmed points) if p>2.

References

Cuesta-Albertos, J. A., Gordaliza, A., and Matran, C. (1997) Trimmed k-Means: An Attempt to Robustify Quantizers, Annals of Statistics, 25, 553-576.

See Also

plotcluster

Examples

Run this code
set.seed(10001)
  n1 <-60
  n2 <-60
  n3 <-70
  n0 <-10
  nn <- n1+n2+n3+n0
  pp <- 2
  X <- matrix(rep(0,nn*pp),nrow=nn)
  ii <-0
  for (i in 1:n1){
    ii <-ii+1
    X[ii,] <- c(5,-5)+rnorm(2)
  }
  for (i in 1:n2){
    ii <- ii+1
    X[ii,] <- c(5,5)+rnorm(2)*0.75
  }
  for (i in 1:n3){
    ii <- ii+1
    X[ii,] <- c(-5,-5)+rnorm(2)*0.75
  }
  for (i in 1:n0){
    ii <- ii+1
    X[ii,] <- rnorm(2)*8
  }
  tkm1 <- trimkmeans(X,k=3,trim=0.1,runs=3)
# runs=3 is used to save computing time.
  print(tkm1)
  plot(tkm1,X)

Run the code above in your browser using DataLab