Learn R Programming

UniversalCVI (version 1.2.0)

PB.IDX: Point biserial correlation (PB)

Description

Computes the PB (G. W. Miligan, 1980) index for a result either kmeans or hierarchical clustering from user specified kmin to kmax.

Usage

PB.IDX(x, kmax, kmin = 2, method = "kmeans", corr = "pearson", nstart = 100)

Value

PB

the PB index for k from kmin to kmax shown in a data frame where the first and the second columns are k and the PB index, respectively.

Arguments

x

a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point.

kmax

a maximum number of clusters to be considered.

kmin

a minimum number of clusters to be considered. The default is 2.

method

a character string indicating which clustering method to be used ("kmeans", "hclust_complete", "hclust_average", "hclust_single"). The default is "kmeans".

corr

a character string indicating which correlation coefficient is to be computed ("pearson", "kendall" or "spearman"). The default is "pearson".

nstart

a maximum number of initial random sets for kmeans for method = "kmeans". The default is 100.

Author

Nathakhun Wiroonsri and Onthada Preedasawakul

Details

The largest value of \(PB(k)\) indicates a valid optimal partition.

References

G. W. Miligan, "An examination of the effect of six types of error perturbation on fifteen clustering algorithms," Psychometrika, 45, 325-342 (1980).

See Also

Hvalid, Wvalid, DI.IDX, FzzyCVIs, R1_data

Examples

Run this code

library(UniversalCVI)

# The data is from Wiroonsri (2024).
x = R1_data[,1:2]

# ---- Kmeans ----

# Compute PB index
K.PB = PB.IDX(scale(x), kmax = 15, kmin = 2, method = "kmeans",
  corr = "pearson", nstart = 100)
print(K.PB)

# The optimal number of cluster
K.PB[which.max(K.PB$PB),]

# ---- Hierarchical ----

# Average linkage

# Compute PB index
H.PB = PB.IDX(scale(x), kmax = 15, kmin = 2, method = "hclust_average",
  corr = "pearson")
print(H.PB)

# The optimal number of cluster
H.PB[which.max(H.PB$PB),]

Run the code above in your browser using DataLab