50% off | Unlimited Data & AI Learning

Last chance! 50% off unlimited learning

Sale ends in


pivmet (version 0.5.0)

piv_sel: Pivotal Selection via Co-Association Matrix

Description

Finding pivotal units from a data partition and a co-association matrix C according to three different methods.

Usage

piv_sel(C, clusters)

Value

pivots

A matrix with k rows and three columns containing the indexes of the pivotal units for each method.

Arguments

C

A N×N co-association matrix, i.e. a matrix whose elements are co-occurrences of pair of units in the same cluster among H distinct partitions.

clusters

A vector of integers from 1:k indicating a partition of the N units into, say, k groups.

Author

Leonardo Egidi legidi@units.it

Details

Given a set of N observations (y1,y2,...,yN) (yi may be a d-dimensional vector, d1), consider clustering methods to obtain H distinct partitions into k groups. The matrix C is the co-association matrix, where ci,p=ni,p/H, with ni,p the number of times the pair (yi,yp) is assigned to the same cluster among the H partitions.

Let j be the group containing units Jj, the user may choose iJj that maximizes one of the quantities: pJjcip

or pJjcipjJjcip.

These methods give the unit that maximizes the global within similarity ("maxsumint") and the unit that maximizes the difference between global within and between similarities ("maxsumdiff"), respectively. Alternatively, we may choose iJj, which minimizes: pJjcip, obtaining the most distant unit among the members that minimize the global dissimilarity between one group and all the others ("minsumnoint"). See the vignette for further details.

References

Egidi, L., Pappadà, R., Pauli, F. and Torelli, N. (2018). Relabelling in Bayesian Mixture Models by Pivotal Units. Statistics and Computing, 28(4), 957-969.

Examples

Run this code
# Iris data

data(iris)
# select the columns of variables
x<- iris[,1:4]
N <- nrow(x)
H <- 1000
a <- matrix(NA, H, N)

# Perform H k-means partitions

for (h in 1:H){
 a[h,] <- kmeans(x, centers = 3)$cluster
}
# Build the co-association matrix

C <- matrix(NA, N,N)
for (i in 1:(N-1)){
 for (j in (i+1):N){
   C[i,j] <- sum(a[,i]==a[,j])/H
   C[j,i] <- C[i,j]
 }}

km <- kmeans(x, centers =3)

# Apply three pivotal criteria to the co-association matrix

ris <- piv_sel(C, clusters = km$cluster)

graphics::plot(iris[,1], iris[,2], xlab ="Sepal.Length", ylab= "Sepal.Width",
col = km$cluster)

 # Add the pivots chosen by the maxsumdiff criterion

points( x[ris$pivots[,3], 1:2], col = 1:3,
cex =2, pch = 8 )

Run the code above in your browser using DataLab