as.clustrange: Build a clustrange object to compare different clustering solutions.

Description

Build a clustrange object to compare different clustering solutions.

Usage

as.clustrange(object, diss, weights=NULL, R=1,  samplesize=NULL, ...)
# S3 method for twins
as.clustrange(object, diss, weights=NULL, R=1, samplesize=NULL, 
		ncluster=20, ...) 
# S3 method for hclust
as.clustrange(object, diss, weights=NULL, R=1, samplesize=NULL, 
		ncluster=20, ...) 
# S3 method for dtclust
as.clustrange(object, diss, weights=NULL, R=1, samplesize=NULL, 
		ncluster=20, labels = TRUE, ...)
# S3 method for clustrange
plot(x, stat="noCH", legendpos="bottomright", 
    norm="none", withlegend=TRUE, lwd=1, col=NULL, ylab="Indicators", 
	xlab="N clusters", conf.int=0.9, ci.method="none", ci.alpha=.3, line="t0", ...)

Arguments

object

The object to convert such as a data.frame.

diss

A dissimilarity matrix or a dist object (see dist).

weights

Optional numerical vector containing weights.

Optional number of bootstrap that can be used to build confidence intervals.

samplesize

Size of bootstrap sample. Default to sum of weights.

ncluster

Integer. Maximum number of cluster. The range will include all clustering solution starting from two to ncluster.

labels

Logical. If TRUE, rules to assign an object to a sequence is used to label the cluster (instead of a number).

A clustrange object to be plotted.

stat

Character. The list of statistics to plot or "noCH" to plot all statistics except "CH" and "CHsq" or "all" for all statistics. See wcClusterQuality for a list of possible values. It is also possible to use "RHC" to plot the quality measure 1-HC. Unlike HC, RHC should be maximized as all other quality measures.

legendpos

Character. legend position, see legend.

norm

Character. Normalization method of the statistics can be one of "none" (no normalization), "range" (given as (value -min)/(max-min), "zscore" (adjusted by mean and standard deviation) or "zscoremed" (adjusted by median and median of the difference to the median).

withlegend

Logical. If FALSE, the legend is not plotted.

lwd

Numeric. Line width, see par.

col

A vector of line colors, see par. If NULL, a default set of color is used.

xlab

x axis label.

ylab

y axis label.

conf.int

Confidence to build the confidence interval (default: 0.9).

ci.method

Method used to build the confidence interval (only if bootstrap has been used, see R above). One of "none" (do not plot confidence interval), "norm" (based on normal approximation), "perc" (based on percentile).)

ci.alpha

alpha color value used to plot the interval.

line

Which value should be plotted by the line? One of "t0" (value for actual sample), "mean" (average over all bootstraps), "median"(median over all bootstraps).

…

Additionnal parameters passed to/from methods.

Value

An object of class clustrange with the following elements:

clustering:: A data.frame of all clustering solutions.
stats:: A matrix containing the clustering statistics of each cluster solution.

Details

as.clustrange convert objects to clustrange objects. clustrange objects contains a list of clustering solution with associated statistics and can be used to find the optimal clustering solution.

If object is a data.frame or a matrix, each column should be a clustering solution to be evaluated.

If object is an hclust or twins objects (i.e. hierarchical clustering output, see hclust, diana or agnes), the function compute all clustering solution ranging from two to ncluster and compute the associated statistics.

Examples

Run this code

# NOT RUN {
data(mvad)
## Aggregating state sequence
aggMvad <- wcAggregateCases(mvad[, 17:86], weights=mvad$weight)

## Creating state sequence object
mvad.seq <- seqdef(mvad[aggMvad$aggIndex, 17:86], weights=aggMvad$aggWeights)

## COmpute distance using Hamming distance
diss <- seqdist(mvad.seq, method="HAM")

## Ward clustering
wardCluster <- hclust(as.dist(diss), method="ward", members=aggMvad$aggWeights)

## Computing clustrange from Ward clustering
wardRange <- as.clustrange(wardCluster, diss=diss, 
		weights=aggMvad$aggWeights, ncluster=15)

## Plot all statistics (standardized)
plot(wardRange, stat="all", norm="zscoremed", lwd=3)

## Plot HC, RHC and ASW
plot(wardRange, stat=c("HC", "RHC", "ASWw"), norm="zscore", lwd=3)

# }