Learn R Programming

UniversalCVI

UnversalCVI package is an effective tool for evaluating clustering results by several cluster validity indices. It has functions to compute several indices as listed below for a user specified range of numbers of clusters and compare them in grid plots. The package is compatible with six clustering methods including K-means, fuzzy C-means, EM clustering, and hierarchical clustering (single, average, and complete linkage). Moreover, the UniversalCVI package has a function that computes the accuracy of clustering results when the true classes are known.

UniversalCVI requires the use of the two packages: mclust and e1071 which are for performing the fuzzy C-means (FCM) and EM algorithms, respectively.

In addition to the evaluation tools, the UniversalCVI package also includes 17 simulated datasets intially used for testing and comparing cluster validity indices in several perspectives written in Wiroonsri(2024) and Wiroonsri and Preedasawakul(2023).

The cluster validity indices available in this package are listed as follows:

Hard clustering:

Dunn's index, Calinski–Harabasz index, Davies–Bouldin’s index, Point biserial correlation index, Chou-Su-Lai measure, Davies–Bouldin*’s index, Score function, Starczewski index, Pakhira–Bandyopadhyay–Maulik (for crisp clustering) index, Silhouette index, and Wiroonsri index.

Fuzzy clustering:

Xie–Beni index, KWON index, KWON2 index, TANG index , HF index, Wu–Li index, Pakhira–Bandyopadhyay–Maulik (for fuzzy clustering) index, KPBM index, Correlation Cluster Validity index, Generalized C index, Wiroonsri and Preedasawakul index.

Installation

If you have not already installed mclust and e1071 in your local system, install these package as following first:

install.packages(c('e1071','mclust'))

Install UniversalCVI package

install.packages('UniversalCVI')

Load R package into R working space

 suppressPackageStartupMessages({
library(UniversalCVI)
library(e1071)
library(mclust)
})

Example

Check accuracy of clustering results when the true classes are known


### Use a dataset in this package
x = R1_data

# Check accuracy of clustering results obtained by kmeans, FCM, and EM clustering
AccClust(x, label.names = "label",algorithm = c("FCM","EM","Kmeans"), fzm = 2,
  scale = TRUE, nstart = 20,iter = 100)
  
x = D1_data

# Check accuracy of a clustering result obtained by the FCM algoritm
AccClust(x, label.names = "label",algorithm = "FCM", fzm = 2,
  scale = TRUE, nstart = 20,iter = 100)

Compute hard cluster validity indices

Using Hvalid to compute all index in function for a clustering result from 2 to 15

library(UniversalCVI)

# The data is from Wiroonsri (2024).
x = R1_data[,1:2]


# Compute six cluster validity indices of a kmeans clustering result for k from 2 to 15
IDX.list = c("NCI", "DI", "DB", "STR", "CSL", "CH")

Hvalid.result = Hvalid(scale(x), kmax = 15, kmin = 2, indexlist = IDX.list,
  method = "hclust_average", p = 2, q = 2, corr = "pearson", nstart = 100, NCstart = TRUE)

# Plot the computed indices for k from 2 to 15
plot_idx(Hvalid.result)

Soft clustering

Using FzzyCVIs to compute all the fuzzy cluster validity indices for a clustering result for c from 2 to 15

library(UniversalCVI)

x = R1_data[,1:2]

# Compute six cluster validity indices of a FCM clustering result for c from 2 to 15
IDX.list = c("WP", "PBM", "TANG", "XB", "GC2", "KWON2")
FCVIs = FzzyCVIs(scale(x), cmax = 15, cmin = 2, indexlist = IDX.list, corr = 'pearson',
         method = 'FCM', fzm = 2, iter = 100, nstart = 20, NCstart = TRUE)
# Plot the computed indices for c from 2 to 15
plot_idx(FCVIs)

License

The UniversalCVI package as a whole is distributed under GPL(>=3).

Copy Link

Version

Install

install.packages('UniversalCVI')

Monthly Downloads

243

Version

1.2.0

License

GPL (>= 3)

Maintainer

Nathakhun Wiroonsri

Last Published

January 27th, 2025

Functions in UniversalCVI (1.2.0)

DB.IDX

Davies–Bouldin (DB) and DB* (DBs) indexes
Hvalid

Wiroonsri(2024) correlation-based cluster validity indices and other well-known cluster validity indices
HF.IDX

HF index
D8_data

D8 Artificial Dataset
DI.IDX

Dunn index
FzzyCVIs

Fuzzy cluster validity indexes used in Wiroonsri and Preedasawakul (2023)
GC.IDX

The generalized C index
D6_data

D6 Artificial Dataset
R3_data

R3 Artificial Dataset
KWON2.IDX

KWON2 index
PB.IDX

Point biserial correlation (PB)
R2_data

R2 Artificial Dataset
D7_data

D7 Artificial Dataset
PBM.IDX

Pakhira-Bandyopadhyay-Maulik (PBM) index
R1_data

R1 Artificial Dataset
R4_data

R4 Artificial Dataset
KPBM.IDX

Modified Kernel form of Pakhira-Bandyopadhyay-Maulik (KPBM) index
KWON.IDX

KWON index
R6_data

R6 Artificial Dataset
R5_data

R5 Artificial Dataset
SF.IDX

The score function
SH.IDX

Silhouette index
TANG.IDX

Tang index
STRPBM.IDX

Starczewski and Pakhira-Bandyopadhyay-Maulik for crisp clustering indexes
R7_data

R7 Artificial Dataset
WP.IDX

Wiroonsri and Preedasawakul (WP) index
WL.IDX

Wu and Li (WL) index
plot_idx

Plots for visualizing CVIs
XB.IDX

Xie and Beni (XB) index
Wvalid

Wiroonsri(2024) correlation-based cluster validity indices
D5_data

D5 Artificial Dataset
D1_data

D1 Artificial Dataset
CCV.IDX

Correlation Cluster Validity (CCV) index
AccClust

Accuracy detection for a clustering result with known classes
CH.IDX

Calinski–Harabasz (CH) index
CSL.IDX

Chou-Su-Lai (CSL) index
D10_data

D10 Artificial Dataset
D2_data

D2 Artificial Dataset
D3_data

D3 Artificial Dataset
D9_data

D9 Artificial Dataset
D4_data

D4 Artificial Dataset