flexclust (version 1.4-0)

bclust: Bagged Clustering

Description

Cluster the data in x using the bagged clustering algorithm. A partitioning cluster algorithm such as cclust is run repeatedly on bootstrap samples from the original data. The resulting cluster centers are then combined using the hierarchical cluster algorithm hclust.

Usage

bclust(x, k = 2, base.iter = 10, base.k = 20, minsize = 0,
       dist.method = "euclidian", hclust.method = "average",
       FUN = "cclust", verbose = TRUE, final.cclust = FALSE,
       resample = TRUE, weights = NULL, maxcluster = base.k, ...)
# S4 method for bclust,missing
plot(x, y, maxcluster = x@maxcluster, main = "", ...)
# S4 method for bclust,missing
clusters(object, newdata, k, ...)
# S4 method for bclust
parameters(object, k)

Arguments

x

Matrix of inputs (or object of class "bclust" for plot).

k

Number of clusters.

base.iter

Number of runs of the base cluster algorithm.

base.k

Number of centers used in each repetition of the base method.

minsize

Minimum number of points in a base cluster.

dist.method

Distance method used for the hierarchical clustering, see dist for available distances.

hclust.method

Linkage method used for the hierarchical clustering, see hclust for available methods.

FUN

Partitioning cluster method used as base algorithm.

verbose

Output status messages.

final.cclust

If TRUE, a final cclust step is performed using the output of the bagged clustering as initialization.

resample

Logical, if TRUE the base method is run on bootstrap samples of x, else directly on x.

weights

Vector of length nrow(x), weights for the resampling. By default all observations have equal weight.

maxcluster

Maximum number of clusters memberships are to be computed for.

object

Object of class "bclust".

main

Main title of the plot.

Optional arguments top be passed to the base method in bclust, ignored in plot.

y

Missing.

newdata

An optional data matrix with the same number of columns as the cluster centers. If omitted, the fitted values are used.

Value

bclust returns objects of class "bclust" including the slots

hclust

Return value of the hierarchical clustering of the collection of base centers (Object of class "hclust").

cluster

Vector with indices of the clusters the inputs are assigned to.

centers

Matrix of centers of the final clusters. Only useful, if the hierarchical clustering method produces convex clusters.

allcenters

Matrix of all iter.base * base.centers centers found in the base runs.

Details

First, base.iter bootstrap samples of the original data in x are created by drawing with replacement. The base cluster method is run on each of these samples with base.k centers. The base.method must be the name of a partitioning cluster function returning an object with the same slots as the return value of cclust.

This results in a collection of iter.base * base.centers centers, which are subsequently clustered using the hierarchical method hclust. Base centers with less than minsize points in there respective partitions are removed before the hierarchical clustering. The resulting dendrogram is then cut to produce k clusters.

References

Friedrich Leisch. Bagged clustering. Working Paper 51, SFB ``Adaptive Information Systems and Modeling in Economics and Management Science'', August 1999. http://epub.wu.ac.at/1272/1/document.pdf

Sara Dolnicar and Friedrich Leisch. Winter tourist segments in Austria: Identifying stable vacation styles using bagged clustering techniques. Journal of Travel Research, 41(3):281-292, 2003.

See Also

hclust, cclust

Examples

Run this code
# NOT RUN {
data(iris)
bc1 <- bclust(iris[,1:4], 3, base.k=5)
plot(bc1)

table(clusters(bc1, k=3))
parameters(bc1, k=3)
# }

Run the code above in your browser using DataCamp Workspace