Clustering by merging Gaussian mixture components; computes all methods introduced in Hennig (2010) from an initial mclust clustering. See details section for details.

```
mergenormals(xdata, mclustsummary=NULL,
clustering, probs, muarray, Sigmaarray, z,
method=NULL, cutoff=NULL, by=0.005,
numberstop=NULL, renumber=TRUE, M=50, ...)
``` # S3 method for mergenorm
summary(object, ...)

# S3 method for summary.mergenorm
print(x, ...)

xdata

data (something that can be coerced into a matrix).

mclustsummary

output object from
`summary.mclustBIC`

for `xdata`

. Either
`mclustsummary`

or all of `clustering`

,
`probs`

, `muarray`

, `Sigmaarray`

and `z`

need
to be specified (the latter are obtained from `mclustsummary`

if they are not provided). I am not aware of restrictions of the
usage of
`mclustBIC`

to produce an initial clustering;
covariance matrix models can be restricted and a noise component can be
included if desired, although I have probably not tested all
possibilities.

clustering

vector of integers. Initial assignment of data to mixture components.

probs

vector of component proportions (for all components; should sum up to one).

muarray

matrix of component means (rows).

Sigmaarray

array of component covariance matrices (third dimension refers to component number).

z

matrix of observation- (row-)wise posterior probabilities of belonging to the components (columns).

method

one of `"bhat"`

, `"ridge.uni"`

,
`"ridge.ratio"`

, `"demp"`

, `"dipuni"`

,
`"diptantrum"`

, `"predictive"`

. See details.

cutoff

numeric between 0 and 1. Tuning constant, see details and Hennig (2010). If not specified, the default values given in (9) in Hennig (2010) are used.

by

real between 0 and 1. Interval width for density computation
along the ridgeline, used for methods `"ridge.uni"`

and
`"ridge.ratio"`

. Methods `"dipuni"`

and
`"diptantrum"`

require ridgeline computations and use it as well.

numberstop

integer. If specified, `cutoff`

is ignored and
components are merged until the number of clusters specified here is
reached.

renumber

logical. If `TRUE`

merged clusters are renumbered
from 1 to their number. If not, numbers of the original clustering
are used (numbers of components that were merged into others then
will not appear).

M

integer. Number of times the dataset is divided into two
halves. Used if `method="predictive"`

.

...

additional optional parameters to pass on to
`ridgeline.diagnosis`

or `mixpredictive`

(in
`mergenormals`

).

object

object of class `mergenorm`

, output of
`mergenormals`

.

x

object of class `summary.mergenorm`

, output of
`summary.mergenorm`

.

`mergenormals`

gives out an object of class `mergenorm`

,
which is a List with components

integer vector. Final clustering.

vector of numbers of remaining clusters. These
are given in terms of the original clusters even of
`renumber=TRUE`

, in which case they may be needed to understand
the numbering of some further components, see below.

vector of numbers of components that were "merged away".

vector of values of the merging criterion (see details) at which components were merged.

vector of numbers of clusters to which the original components were merged.

a list, if `mclustsummary`

was provided. Entry
no. i refers to number i in `clusternumbers`

. The list entry i
contains the parameters of the original mixture components that
make up cluster i, as extracted by
`extract.mixturepars`

.

vector of prediction strength values for
clusternumbers from 1 to the number of components in the original
mixture, if `method=="predictive"`

. See
`mixpredictive`

.

square matrix with entries giving the original values of the merging criterion (see details) for every pair of original mixture components.

square matrix as `orig.decisionmatrix`

,
but with final entries; numbering of rows and columns corresponds to
`clusternumbers`

; all entries corresponding to other rows and
columns can be ignored.

final cluster values of `probs`

(see arguments)
for merged components, generated by (potentially repeated) execution
of `mergeparameters`

out of the original
ones. Numbered according to `clusternumbers`

.

final cluster means, analogous to `probs`

.

final cluster covariance matrices, analogous to
`probs`

.

final matrix of posterior probabilities of observations
belonging to the clusters, analogous to `probs`

.

logical. If `TRUE`

, there was a noise component
fitted in the initial mclust clustering (see help for
`initialization`

in `mclustBIC`

). In this
case, a cluster number 0 indicates noise. noise is ignored by the
merging methods and kept as it was originally.

as above.

as above.

Mixture components are merged in a hierarchical fashion. The merging
criterion is computed for all pairs of current clusters and the two
clusters with the highest criterion value (lowest, respectively, for
`method="predictive"`

) are merged. Then criterion values are
recomputed for the merged cluster. Merging is continued until the
criterion value to merge is below (or above, for
`method="predictive"`

) the cutoff value. Details are given in
Hennig (2010). The following criteria are offered, specified by the
`method`

-argument.

- "ridge.uni"
components are only merged if their mixture is unimodal according to Ray and Lindsay's (2005) ridgeline theory, see

`ridgeline.diagnosis`

. This ignores argument`cutoff`

.- "ridge.ratio"
ratio between density minimum between components and minimum of density maxima according to Ray and Lindsay's (2005) ridgeline theory, see

`ridgeline.diagnosis`

.- "bhat"
Bhattacharyya upper bound on misclassification probability between two components, see

`bhattacharyya.matrix`

.- "demp"
direct estimation of misclassification probability between components, see Hennig (2010).

- "dipuni"
this uses

`method="ridge.ratio"`

to decide which clusters to merge but stops merging according to the p-value of the dip test computed as in Hartigan and Hartigan (1985), see`dip.test`

.- "diptantrum"
as

`"dipuni"`

, but p-value of dip test computed as in Tantrum, Murua and Stuetzle (2003), see`dipp.tantrum`

.- "predictive"
this uses

`method="demp"`

to decide which clusters to merge but stops merging according to the value of prediction strength (Tibshirani and Walther, 2005) as computed in`mixpredictive`

.

J. A. Hartigan and P. M. Hartigan (1985) The Dip Test of
Unimodality, *Annals of Statistics*, 13, 70-84.

Hennig, C. (2010) Methods for merging Gaussian mixture components,
*Advances in Data Analysis and Classification*, 4, 3-34.

Ray, S. and Lindsay, B. G. (2005) The Topography of Multivariate
Normal Mixtures, *Annals of Statistics*, 33, 2042-2065.

Tantrum, J., Murua, A. and Stuetzle, W. (2003) Assessment and
Pruning of Hierarchical Model Based Clustering, *Proceedings of the
ninth ACM SIGKDD international conference on Knowledge discovery and
data mining*, Washington, D.C., 197-205.

Tibshirani, R. and Walther, G. (2005) Cluster Validation by
Prediction Strength, *Journal of Computational and Graphical
Statistics*, 14, 511-528.

# NOT RUN { require(mclust) require(MASS) options(digits=3) data(crabs) dc <- crabs[,4:8] cm <- mclustBIC(crabs[,4:8],G=9,modelNames="EEE") scm <- summary(cm,crabs[,4:8]) cmnbhat <- mergenormals(crabs[,4:8],scm,method="bhat") summary(cmnbhat) cmndemp <- mergenormals(crabs[,4:8],scm,method="demp") summary(cmndemp) # Other methods take a bit longer, but try them! # The values of by and M below are still chosen for reasonably fast execution. # cmnrr <- mergenormals(crabs[,4:8],scm,method="ridge.ratio",by=0.05) # cmd <- mergenormals(crabs[,4:8],scm,method="dip.tantrum",by=0.05) # cmp <- mergenormals(crabs[,4:8],scm,method="predictive",M=3) # }