RSKC (version 2.4.2)

revisedsil: The revised silhouette

Description

This function returns a revised silhouette plot, cluster centers in weighted squared Euclidean distances and a matrix containing the weighted squared Euclidean distances between cases and each cluster center. Missing values are adjusted.

Usage

revisedsil(d,reRSKC=NULL,CASEofINT=NULL,col1="black", CASEofINT2 = NULL, col2="red", print.plot=TRUE, W=NULL,C=NULL,out=NULL)

Arguments

d
A numerical data matrix, N by p, where N is the number of cases and p is the number of features.
reRSKC
A list output from RSKC function.
CASEofINT
Necessary if print.plot=TRUE. A vector of the case indices that appear in the revised silhouette plot. The revised silhouette widths of these indices are colored in col1 if CASEofINT != NULL. The average silhouette of each cluster printed in the plot is computed EXCLUDING these cases.
col1
See CASEofINT.
CASEofINT2
A vector of the case indices that appear in the revised silhouette plot. The indices are colored in col2.
col2
See CASEofINT2
print.plot
If TRUE, the revised silhouette is plotted.
W
Necessary if reRSKC = NULL. A positive real vector of weights of length p.
C
Necessary if reRSKC = NULL. An integer vector of class labels of length N.
out
Necessary if reRSKC = NULL. Vector of the case indices that should be excluded in the calculation of cluster centers. In RSKC, cluster centers are calculated without the cases that have the furthest 100*alpha % Weighted squared Euclidean distances to their closest cluster centers. If one wants to obtain the cluster centers from RSKC output, set out = $oW.

Value

References

Yumi Kondo (2011), Robustificaiton of the sparse K-means clustering algorithm, MSc. Thesis, University of British Columbia http://hdl.handle.net/2429/37093

Examples

Run this code

# little simulation function 
sim <-
function(mu,f){
   D<-matrix(rnorm(60*f),60,f)
   D[1:20,1:50]<-D[1:20,1:50]+mu
   D[21:40,1:50]<-D[21:40,1:50]-mu  
   return(D)
   }


### output trans.mu ###

p<-200;ncl<-3
# simulate a 60 by p data matrix with 3 classes 
d<-sim(2,p)
# run RSKC
re<-RSKC(d,ncl,L1=2,alpha=0.05)
# cluster centers in weighted squared Euclidean distances by function sil
sil.mu<-revisedsil(d,W=re$weights,C=re$labels,out=re$oW,print.plot=FALSE)$trans.mu
# calculation 
trans.d<-sweep(d[,re$weights!=0],2,sqrt(re$weights[re$weights!=0]),FUN="*") 
class<-re$labels;class[re$oW]<-ncl+1
MEANs<-matrix(NA,ncl,ncol(trans.d))
for ( i in 1 : 3) MEANs[i,]<-colMeans(trans.d[class==i,,drop=FALSE])
sil.mu==MEANs
# coincides 

### output WdisC ###

p<-200;ncl<-3;N<-60
# generate 60 by p data matrix with 3 classes 
d<-sim(2,p)
# run RSKC
re<-RSKC(d,ncl,L1=2,alpha=0.05)
si<-revisedsil(d,W=re$weights,C=re$labels,out=re$oW,print.plot=FALSE)
si.mu<-si$trans.mu
si.wdisc<-si$WdisC
trans.d<-sweep(d[,re$weights!=0],2,sqrt(re$weights[re$weights!=0]),FUN="*") 
WdisC<-matrix(NA,N,ncl)
for ( i in 1 : ncl) WdisC[,i]<-rowSums(scale(trans.d,center=si.mu[i,],scale=FALSE)^2)
# WdisC and si.wdisc coincides

Run the code above in your browser using DataLab