trimkmeans: Trimmed k-means clustering

Description

The trimmed k-means clustering method by Cuesta-Albertos, Gordaliza and Matran (1997). This optimizes the k-means criterion under trimming a portion of the points.

Usage

trimkmeans(data,k,trim=0.1, scaling=FALSE, runs=100, points=NULL,
                       countmode=runs+1, printcrit=FALSE,
                       maxit=2*nrow(as.matrix(data)))
  # S3 method for tkm
print(x, ...)
  # S3 method for tkm
plot(x, data, ...)

Arguments

data

matrix or data.frame with raw data

integer. Number of clusters.

trim

numeric between 0 and 1. Proportion of points to be trimmed.

scaling

logical. If TRUE, the variables are centered at their means and scaled to unit variance before execution.

runs

integer. Number of algorithm runs from initial means (randomly chosen from the data points).

points

NULL or a matrix with k vectors used as means to initialize the algorithm. If initial mean vectors are specified, runs should be 1 (otherwise the same initial means are used for all runs).

countmode

optional positive integer. Every countmode algorithm runs trimkmeans shows a message.

printcrit

logical. If TRUE, all criterion values (mean squares) of the algorithm runs are printed.

maxit

integer. Maximum number of iterations within an algorithm run. Each iteration determines all points which are closer to a different cluster center than the one to which they are currently assigned. The algorithm terminates if no more points have to be reassigned, or if maxit is reached.

object of class tkm.

...

further arguments to be transferred to plot or plotcluster.

Value

An object of class 'tkm' which is a LIST with components

classification

integer vector coding cluster membership with trimmed observations coded as k+1.

means

numerical matrix giving the mean vectors of the k classes.

disttom

vector of squared Euclidean distances of all points to the closest mean.

ropt

maximum value of disttom so that the corresponding point is not trimmed.

see above.

trim

see above.

runs

see above.

scaling

see above.

Details

plot.tkm calls plotcluster if the dimensionality of the data p is 1, shows a scatterplot with non-trimmed regions if p=2 and discriminant coordinates computed from the clusters (ignoring the trimmed points) if p>2.

References

Cuesta-Albertos, J. A., Gordaliza, A., and Matran, C. (1997) Trimmed k-Means: An Attempt to Robustify Quantizers, Annals of Statistics, 25, 553-576.

Examples

Run this code

# NOT RUN {
  set.seed(10001)
  n1 <-60
  n2 <-60
  n3 <-70
  n0 <-10
  nn <- n1+n2+n3+n0
  pp <- 2
  X <- matrix(rep(0,nn*pp),nrow=nn)
  ii <-0
  for (i in 1:n1){
    ii <-ii+1
    X[ii,] <- c(5,-5)+rnorm(2)
  }
  for (i in 1:n2){
    ii <- ii+1
    X[ii,] <- c(5,5)+rnorm(2)*0.75
  }
  for (i in 1:n3){
    ii <- ii+1
    X[ii,] <- c(-5,-5)+rnorm(2)*0.75
  }
  for (i in 1:n0){
    ii <- ii+1
    X[ii,] <- rnorm(2)*8
  }
  tkm1 <- trimkmeans(X,k=3,trim=0.1,runs=3)
# runs=3 is used to save computing time.
  print(tkm1)
  plot(tkm1,X)
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

Details

References

See Also

Examples