Learn R Programming

BayesCVI (version 1.0.2)

plot_BCVI: Plots for visualizing BCVI

Description

Plot Bayesian cluster validity index (BCVI) with and without standard deviation error bars and the underlying index.

Usage

plot_BCVI(B.result, mult.err.bar = 2)

Value

plot_index

a plot of the underlying index for the number of groups from \(2\) to \(kmax\) according to B.result

plot_BCVI

a plot of BCVI for the number of groups from \(2\) to \(kmax\) according to B.result

error_bar_plot

a plot of BCVI with error bars for the number of groups from \(2\) to \(kmax\) according to B.result

Arguments

B.result

a result from one of the functions B_XB.IDX, B_Wvalid, B_WP.IDX, B_WL.IDX, B_TANG.IDX, B_STRPBM.IDX, B_SF.IDX, B_PBM.IDX, B_PB.IDX, B_KWON.IDX, B_KWON2.IDX, B_KPBM.IDX, B_HF.IDX, B_GC.IDX, B_DI.IDX, B_DB.IDX, B_CSL.IDX, B_CH.IDX, B_CCV.IDX and B_BayesCVIs.IDX

mult.err.bar

a multiplier of the stadard deviations to be used for plotting error bars

Author

Nathakhun Wiroonsri and Onthada Preedasawakul

Details

BCVI is defined as follows.

Let
$$r_k(\bf x) = \dfrac{\max_j CVI(j)- CVI(k)}{\sum_{i=2}^K (\max_j CVI(j) - CVI(i))}$$ for a cluster validity index (CVI) such that the smallest value indicates the optimal number of clusters and $$r_k(\bf x) = \dfrac{CVI(k)-\min_j CVI(j)}{\sum_{i=2}^K (CVI(i)-\min_j CVI(j))} $$ for a CVI such that the largest indicates the optimal number of clusters. Assume that
$$f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}$$ represents the conditional probability density function of the dataset given \(\bf p\), where \(C({\bf p})\) is the normalizing constant. Assume further that \({\bf p}\) follows a Dirichlet prior distribution with parameters \({\bm \alpha} = (\alpha_2,\ldots,\alpha_K)\). The posterior distribution of \(\bf p\) still remains a Dirichlet distribution with parameters \((\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))\).

The BCVI is then defined as
$$BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}$$ where \(\alpha_0 = \sum_{k=2}^K \alpha_k.\)

The variance of \(p_k\) can be computed as $$Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.$$

References

O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. tools:::Rd_expr_doi("10.1016/j.csda.2024.108053")

See Also

B_STRPBM.IDX, B_TANG.IDX, B_XB.IDX, B_Wvalid, B_WP.IDX, B_DB.IDX

Examples

Run this code

library(BayesCVI)
library(UniversalCVI)

##Soft clustering

# The data included in this package.
data = B7_data[,1:2]

# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)

B.XB = B_XB.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2,
              nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2)

# plot the BCVI

pplot = plot_BCVI(B.XB)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot


## Hard clustering

# The data included in this package.
data = B2_data[,1:2]

K.STR = STRPBM.IDX(scale(data), kmax = 10, kmin = 2, method = "kmeans",
  indexlist = "STR", nstart = 100)

# WP.IDX values
result = K.STR$STR$STR


aalpha = c(20,20,20,5,5,5,0.5,0.5,0.5)
B.STR = BayesCVIs(CVI = result,
          n = nrow(data),
          kmax = 10,
          opt.pt = "max",
          alpha = aalpha,
          mult.alpha = 1/2)

# plot the BCVI

pplot = plot_BCVI(B.STR)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot

Run the code above in your browser using DataLab