Learn R Programming

clue (version 0.3-39)

pclust: Prototype-Based Partitioning

Description

Obtain prototype-based partitions of objects by minimizing the criterion $\sum w_b u_{bj}^m d(x_b, p_j)^e$, the sum of the case-weighted and membership-weighted $e$-th powers of the dissimilarities between the objects $x_b$ and the prototypes $p_j$, for suitable dissimilarities $d$ and exponents $e$.

Usage

pclust(x, k, family, m = 1, weights = 1, control = list())
pclust_family(D, C, init = NULL, description = NULL, e = 1,
              .modify = NULL, .subset = NULL)
pclust_object(prototypes, membership, cluster, family, m = 1,
              value, ..., classes = NULL, attributes = NULL)

Arguments

x
the object to be partitioned.
k
an integer giving the number of classes to be used in the partition.
family
an object of class "pclust_family" as generated by pclust_family, containing the information about $d$ and $e$.
m
a number not less than 1 controlling the softness of the partition (as the fuzzification parameter of the fuzzy $c$-means algorithm). The default value of 1 corresponds to hard partitions obtained from a generalized $k$-means
weights
a numeric vector of non-negative case weights. Recycled to the number of elements given by x if necessary.
control
a list of control parameters. See Details.
D
a function for computing dissimilarities $d$ between objects and prototypes.
C
a consensus function with formals x, weights and control for computing a consensus prototype $p$ minimizing $\sum_b w_b d(x_b, p) ^ e$.
init
a function with formals x and k initializing an object with $k$ prototypes from the object x to be partitioned.
description
a character string describing the family.
e
a number giving the exponent $e$ of the criterion.
.modify
a function with formals x, i and value for modifying a single prototype, or NULL (default).
.subset
a function with formals x and i for subsetting prototypes, or NULL (default).
prototypes
an object representing the prototypes of the partition.
membership
an object of class "cl_membership" with the membership values $u_{bj}$.
cluster
the class ids of the nearest hard partition.
value
the value of the criterion to be minimized.
...
further elements to be included in the generated pclust object.
classes
a character vector giving further classes to be given to the generated pclust object in addition to "pclust", or NULL (default).
attributes
a list of attributes, or NULL (default).

Value

  • pclust returns the partition found as an object of class "pclust" (as obtained by calling pclust_object) which in addition to the default components contains call (the matched call) and a converged attribute indicating convergence status (i.e., whether the maximal number of iterations was reached).

    pclust_family returns an object of class "pclust_family", which is a list with components corresponding to the formals of pclust_family.

    pclust_object returns an object inheriting from class "pclust", which is a list with components corresponding to the formals (up to and including ...) of pclust_object, and additional classes and attributes specified by classes and attributes, respectively.

Details

For $m = 1$, a generalization of the Lloyd-Forgy variant of the $k$-means algorithm is used, which iterates between reclassifying objects to their closest prototypes (according to the dissimilarities given by D), and computing new prototypes as the consensus for the classes (using C).

For $m > 1$, a generalization of the fuzzy $c$-means recipe (e.g., Bezdek (1981)) is used, which alternates between computing optimal memberships for fixed prototypes, and computing new prototypes as the suitably weighted consensus clusterings for the classes.

This procedure is repeated until convergence occurs, or the maximal number of iterations is reached.

Currently, no local improvement heuristics are provided.

It is possible to perform several runs of the procedure via control arguments nruns or start (the default is to perform a single run), in which case the first partition with the smallest value of the criterion is returned.

The dissimilarity and consensus functions as well as the exponent $e$ are specified via family. In principle, arbitrary representations of objects to be partitioned and prototypes (which do not necessarily have to be of the same kind) can be employed. In addition to D and C, what is needed are means to obtain an initial collection of $k$ prototypes (init), to modify a single prototype (.modify), and subset the prototypes (.subset). By default, list and (currently, only dense) matrix (with the usual convention that the rows correspond to the objects) are supported. Otherwise, the family has to provide the functions needed.

Available control parameters are as follows. [object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

The fixed point approach employed is a heuristic which cannot be guaranteed to find the global minimum, in particular if C is not an exact minimizer. Standard practice would recommend to use the best solution found in sufficiently many replications of the base algorithm.

References

J. C. Bezdek (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum.

See Also

kmeans, cmeans.