pclust: Prototype-Based Partitions of Relations

Description

Compute prototype-based partitions of a relation ensemble by minimizing $\sum w_b u_{bj}^m d(x_b, p_j)^e$, the sum of the case-weighted and membership-weighted $e$-th powers of the dissimilarities between the elements $x_b$ of the ensemble and the prototypes $p_j$, for suitable dissimilarities $d$ and exponents $e$.

Usage

relation_pclust(x, k, method, m = 1, weights = 1, control = list())

Arguments

an ensemble of relations, or something coercible to that (see relation_ensemble).

an integer giving the number of classes to be used in the partition.

method

the consensus method to be employed, see relation_consensus.

a number not less than 1 controlling the softness of the partition (as the fuzzification parameter of the fuzzy $c$-means algorithm). The default value of 1 corresponds to hard partitions obtained from a generalized $k$-means

weights

a numeric vector of non-negative case weights. Recycled to the number of elements in the ensemble given by x if necessary.

control

a list of control parameters. See Details.

Value

An object of class "cl_partition".

encoding

UTF-8

Details

For $m = 1$, a generalization of the Lloyd-Forgy variant of the $k$-means algorithm is used, which iterates between reclassifying objects to their closest prototypes, and computing new prototypes as consensus relations (generalized central relations, Régnier (1965)) for the classes. This procedure was proposed in Gaul und Schader (1988) as the Clusterwise Aggregation of Relations (CAR).

For $m > 1$, a generalization of the fuzzy $c$-means recipe is used, which alternates between computing optimal memberships for fixed prototypes, and computing new prototypes as the consensus relations for the classes.

This procedure is repeated until convergence occurs, or the maximal number of iterations is reached.

Consensus relations are computed using relation_consensus. Available control parameters are as follows. [object Object],[object Object],[object Object]

The dissimilarities $d$ and exponent $e$ are implied by the consensus method employed, and inferred via a registration mechanism currently only made available to built-in consensus methods. For the time being, all optimization-based consensus methods use the symmetric difference dissimilarity (see relation_dissimilarity) for $d$ and $e = 1$.

The fixed point approach employed is a heuristic which cannot be guaranteed to find the global minimum. Standard practice would recommend to use the best solution found in sufficiently many replications of the base algorithm.

References

S. Régnier (1965). Sur quelques aspects mathématiques des problèmes de classification automatique. ICC Bulletin, 4:175--191. W. Gaul and M. Schader (1988). Clusterwise aggregation of relations. Applied Stochastic Models and Data Analysis, 4:273--282.