bag: Bagging for Clustering

Description

Construct partitions of objects by running a base clustering algorithm on bootstrap samples from a given data set, and suitably aggregating these primary partitions.

Usage

cl_bag(x, B, k = NULL, algorithm = "kmeans", parameters = NULL, 
       method = "DFBC1", control = NULL)

Arguments

Value

An R object representing a partition of the objects given in x.

Details

Bagging for clustering is really a rather general conceptual framework than a specific algorithm. If the primary partitions generated in the bootstrap stage form a cluster ensemble (so that class memberships of the objects in x can be obtained), consensus methods for cluster ensembles (as e.g. implemented in cl_consensus and cl_medoid) can be employed for the aggregation stage. In particular, (possibly new) bagging algorithms can easily be realized by directly running cl_consensus on the results of cl_boot.

In BagClust1, aggregation proceeds by generating a reference partition by running the base clustering algorithm on the whole given data set, and averaging the ensemble memberships after optimally matching them to the reference partition (in fact, by minimizing Euclidean dissimilarity, see cl_dissimilarity).

If the base clustering algorithm yields prototypes, aggregation can be based on clustering these. This is the idea underlying the Bagged Clustering algorithm introduced in Leisch (1999) and implemented by function bclust in package e1071.

References

S. Dudoit and J. Fridlyand (2003), Bagging to improve the accuracy of a clustering procedure. Bioinformatics, 19/9, 1090--1099. F. Leisch (1999), Bagged Clustering. Working Paper 51, SFB Adaptive Information Systems and Modeling in Economics and Management Science. http://www.ci.tuwien.ac.at/~leisch/papers/wp51.ps.

Examples

Run this code

set.seed(1234)
## Run BagClust1 on the Cassini data.
data("Cassini")
party <- cl_bag(Cassini$x, 50, 3)
plot(Cassini$x, col = cl_class_ids(party), xlab = "", ylab = "")
## Actually, using fuzzy c-means as a base learner works much better:
if(require("e1071", quiet = TRUE)) {
    party <- cl_bag(Cassini$x, 20, 3, algorithm = "cmeans")
    plot(Cassini$x, col = cl_class_ids(party), xlab = "", ylab = "")
}

Run the code above in your browser using DataLab