Learn R Programming

HDLSSkST (version 2.1.0)

gMADD: Modified K-Means Algorithm by Using a New Dissimilarity Measure, MADD

Description

Performs modified K-means algorithm by using a new dissimilarity measure, called MADD, and provides estimated cluster (class) labels or memberships of observations.

Usage

gMADD(s_psi, s_h, n_clust, lb, M)

Arguments

s_psi

function required for clustering, 1 for \(t^2\), 2 for \(1-\exp(-t)\), 3 for \(1-\exp(-t^2)\), 4 for \(\log(1+t)\), 5 for \(t\)

s_h

function required for clustering, 1 for \(\sqrt t\), 2 for \(t\)

n_clust

total number of the classes in the whole observations

lb

each observation is partitioned into some numbers of smaller vectors of same length \(lb\)

M

\(n\times d\) observations matrix of pooled sample, the observations should be grouped by their respective classes

Value

a vector of length n of estimated cluster (class) labels of observations

References

Biplab Paul, Shyamal K De and Anil K Ghosh (2021). Some clustering based exact distribution-free k-sample tests applicable to high dimension, low sample size data, Journal of Multivariate Analysis, doi:10.1016/j.jmva.2021.104897.

Soham Sarkar and Anil K Ghosh (2019). On perfect clustering of high dimension, low sample size data, IEEE transactions on pattern analysis and machine intelligence, doi:10.1109/TPAMI.2019.2912599.

Examples

Run this code
# NOT RUN {
  # Modified K-means algorithm:
  # muiltivariate normal distribution
  # generate data with dimension d = 500
  set.seed(151)
  n1=n2=n3=n4=10
  d = 500
  I1 <- matrix(rnorm(n1*d,mean=0,sd=1),n1,d)
  I2 <- matrix(rnorm(n2*d,mean=0.5,sd=1),n2,d) 
  I3 <- matrix(rnorm(n3*d,mean=1,sd=1),n3,d) 
  I4 <- matrix(rnorm(n4*d,mean=1.5,sd=1),n4,d) 
  n_cl <- 4
  X <- as.matrix(rbind(I1,I2,I3,I4)) 
  gMADD(1,1,n_cl,1,X)
  
   ## outputs:
   #[1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
# }

Run the code above in your browser using DataLab