kcevclus: k-CEVCLUS algorithm

Description

kcevclus computes a credal partition from a dissimilarity matrix and pairwise (must-link and cannot-link) constraints using the k-CEVCLUS algorithm.

Usage

kcevclus(
  x,
  k = n - 1,
  D,
  J,
  c,
  ML,
  CL,
  xi = 0.5,
  type = "simple",
  pairs = NULL,
  m0 = NULL,
  ntrials = 1,
  disp = TRUE,
  maxit = 1000,
  epsi = 1e-05,
  d0 = quantile(D, 0.9),
  tr = FALSE,
  change.order = FALSE,
  norm = 1
)

Value

The credal partition (an object of class "credpart"). In addition to the usual attributes, the output credal partition has the following attributes:

Kmat: The matrix of degrees of conflict. Same size as D.
D: The normalized dissimilarity matrix.
trace: Trace of the algorithm (Stress function vs iterations).
J: The matrix of indices.

Arguments

x: nxp matrix of p attributes observed for n objects (optional).
k: Number of distances to compute for each object (default: n-1).
D: nxn or nxk dissimilarity matrix (used only of x is not supplied).
J: nxk matrix of indices. D[i,j] is the distance between objects i and J[i,j]. (Used only if D is supplied and ncol(D)<n; then k is set to ncol(D).)
c: Number of clusters
ML: Matrix nbML x 2 of must-link constraints. Each row of ML contains the indices of objects that belong to the same class.
CL: Matrix nbCL x 2 of cannot-link constraints. Each row of CL contains the indices of objects that belong to different classes.
xi: Penalization coefficient.
type: Type of focal sets ("simple": empty set, singletons and Omega; "full": all \(2^c\) subsets of \(\Omega\); "pairs": \(\emptyset\), singletons, \(\Omega\), and all or selected pairs).
pairs: Set of pairs to be included in the focal sets; if NULL, all pairs are included. Used only if type="pairs".
m0: Initial credal partition. Should be a matrix with n rows and a number of columns equal to the number f of focal sets specified by 'type' and 'pairs'.
ntrials: Number of runs of the optimization algorithm (set to 1 if m0 is supplied and change.order=FALSE).
disp: If TRUE (default), intermediate results are displayed.
maxit: Maximum number of iterations.
epsi: Minimum amount of improvement.
d0: Parameter used for matrix normalization. The normalized distance corresponding to d0 is 0.95.
tr: If TRUE, a trace of the stress function is returned.
change.order: If TRUE, the order of objects is changed at each iteration of the Iterative Row-wise Quadratic Programming (IRQP) algorithm.
norm: Normalization of distances. 1: division by mean(D^2) (default); 2: division par n*p.

Author

Feng Li and Thierry Denoeux.

Details

k-CEVCLUS is a version of EVCLUS allowing the user to specify pairwise constraints to guide the clustering process. Pairwise constraints are of two kinds: must-link contraints are pairs of objects that are known to belong to the same class, and cannot-link constraints are pairs of objects that are known to belong to different classes. As kevclus, kcevclus uses the Iterative Row-wise Quadratic Programming (IRQP) algorithm (see ter Braak et al., 2009). It also makes it possible to use only a random sample of the dissimilarities, reducing the time and space complexity from quadratic to roughly linear (Denoeux et al., 2016).

References

F. Li, S. Li and T. Denoeux. k-CEVCLUS: Constrained evidential clustering of large dissimilarity data. Knowledge-Based Systems 142:29-44, 2018.

T. Denoeux, S. Sriboonchitta and O. Kanjanatarakul. Evidential clustering of large dissimilarity data. Knowledge-Based Systems 106:179-195, 2016.

V. Antoine, B. Quost, M.-H. Masson and T. Denoeux. CEVCLUS: Evidential clustering with instance-level constraints for relational data. Soft Computing 18(7):1321-1335, 2014.

C. J. ter Braak, Y. Kourmpetis, H. A. Kiers, and M. C. Bink. Approximating a similarity matrix by a latent class model: A reappraisal of additive fuzzy clustering. Computational Statistics & Data Analysis 53(8):3183--3193, 2009.

Examples

Run this code

if (FALSE) {
data<-bananas(2000)
D<-as.matrix(dist(data$x))
link<-create_MLCL(data$y,2000)
clus0<-kevclus(D=D,k=200,c=2)
clus1<-kcevclus(D=D,k=200,c=2,ML=link2$ML,CL=link2$CL,Xi=0.1,m0=clus0$mass)
clus2<-kcevclus(D=D,k=200,c=2,ML=link2$ML,CL=link2$CL,Xi=0.5,m0=clus1$mass)
plot(clus2,X=data$x,ytrue=data$y,Outliers=FALSE,Approx=1)
}

Run the code above in your browser using DataLab