ls_fit_ultrametric: Least Squares Fit of Ultrametrics to Dissimilarities

Description

Find the ultrametric minimizing least squares distance (Euclidean dissimilarity) to a given dissimilarity object.

Usage

ls_fit_ultrametric(x, method = c("SUMT", "IP", "IR"), weights = 1,
                   control = list())

Arguments

a dissimilarity object inheriting from class "dist".

method

a character string indicating the fitting method to be employed. Must be one of "SUMT" (default), "IP", or "IR", or a unique abbreviation thereof.

weights

a numeric vector or matrix with non-negative weights for obtaining a weighted least squares fit. If a matrix, its numbers of rows and columns must be the same as the number of objects in x, and the lower diagonal part is used.

control

a list of control parameters. See Details.

Value

An object of class "cl_ultrametric" containing the optimal ultrametric distances.

encoding

UTF-8

Details

With $L(u) = \sum w_{ij} (x_{ij} - u_{ij})^2$, the problem to be solved is minimizing $L$ over all $u$ satisfying the ultrametric constraints (i.e., for all $i, j, k$, $u_{ij} \le \max(u_{ik}, u_{jk})$). This problem is known to be NP hard (Krivanek and Moravek, 1986).

We provide three heuristics for solving this problem.

Method "SUMT" implements the SUMT (Sequential Unconstrained Minimization Technique, Fiacco and McCormick, 1968) approach of de Soete (1986) which in turn simplifies the suggestions in Carroll and Pruzansky (1980). One iteratively minimizes $L(u) + \rho_k P(u)$, where $P(u)$ is a non-negative function penalizing violations of the ultrametric constraints such that $P(u)$ is zero iff $u$ is an ultrametric. The $\rho$ values are increased according to the rule $\rho_{k+1} = q \rho_k$ for some constant $q > 1$, until convergence is obtained in the sense that the Euclidean distance between successive solutions $u_k$ and $u_{k+1}$ is small enough. We then use a final rounding step to ensure that the returned object exactly satisfies the ultrametric constraints. The starting value $u_0$ is obtained by random shaking of the given dissimilarity object (if not given). If there are missing values in x, i.e., the given dissimilarities are incomplete, we follow a suggestion of de Soete (1984), imputing the missing values by the weighted mean of the non-missing ones, and setting the corresponding weights to zero.

The unconstrained minimizations are carried out using either optim or nlm, using the analytic gradients given in Carroll and Pruzansky (1980). The following control parameters can be provided via the control argument.

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

The default optimization using conjugate gradients should work reasonably well for medium to large size problems. For small ones, using nlm is usually faster. Note that the number of ultrametric constraints is of the order $n^3$, where $n$ is the number of objects in the dissimilarity object, suggesting to use the SUMT approach in favor of constrOptim.

If starting values for the SUMT are provided via start, the number of starting values gives the number of runs to be performed, and control option nruns is ignored. Otherwise, nruns starting values are obtained by random shaking of the dissimilarity to be fitted. In the case of multiple SUMT runs, the (first) best solution found is returned.

Method "IP" implements the Iterative Projection approach of Hubert and Arabie (1995). This iteratively projects the current dissimilarities to the closed convex set given by the ultrametric constraints (3-point conditions) for a single index triple $(i, j, k)$, in fact replacing the two largest values among $d_{ij}, d_{ik}, d_{jk}$ by their mean. The following control parameters can be provided via the control argument.

[object Object],[object Object],[object Object],[object Object],[object Object]

If permutations are provided via order, the number of these gives the number of runs to be performed, and control option nruns is ignored. Otherwise, nruns randomly generated orders are tried. In the case of multiple runs, the (first) best solution found is returned.

Non-identical weights and incomplete dissimilarities are currently not supported.

Method "IR" implements the Iterative Reduction approach suggested by Roux (1988), see also Barthélémy and Guénoche (1991). This is similar to the Iterative Projection method, but modifies the dissimilarities between objects proportionally to the aggregated change incurred from the ultrametric projections. Available control parameters are identical to those of method "IP".

Non-identical weights and incomplete dissimilarities are currently not supported.

It should be noted that all methods are heuristics which can not be guaranteed to find the global minimum. Standard practice would recommend to use the best solution found in sufficiently many replications of the base algorithm.

References

J.-P. Barthélémy and A. Guénoche (1991). Trees and proximity representations. Chichester: John Wiley & Sons. ISBN 0-471-92263-3. J. D. Carroll and S. Pruzansky (1980). Discrete and hybrid scaling models. In E. D. Lantermann and H. Feger (eds.), Similarity and Choice. Bern (Switzerland): Huber.

A. V. Fiacco and G. P. McCormick (1968). Nonlinear programming: Sequential unconstrained minimization techniques. New York: John Wiley & Sons.

L. Hubert and P. Arabie (1995). Iterative projection strategies for the least squares fitting of tree structures to proximity data. British Journal of Mathematical and Statistical Psychology, 48, 281--317. M. Krivanek and J. Moravek (1986). NP-hard problems in hierarchical tree clustering. Acta Informatica, 23, 311--323. M. Roux (1988). Techniques of approximation for building two tree structures. In C. Hayashi and E. Diday and M. Jambu and N. Ohsumi (Eds.), Recent Developments in Clustering and Data Analysis, pages 151--170. New York: Academic Press.

G. de Soete (1984). Ultrametric tree representations of incomplete dissimilarity data. Journal of Classification, 1, 235--242.

G. de Soete (1986). A least squares algorithm for fitting an ultrametric tree to a dissimilarity matrix. Pattern Recognition Letters, 2, 133--137.

Examples

Run this code

## Least squares fit of an ultrametric to the Miller-Nicely consonant
## phoneme confusion data.
data("Phonemes")
## Note that the Phonemes data set has the consonant misclassification
## probabilities, i.e., the similarities between the phonemes.
d <- 1 - as.dist(Phonemes)
u <- ls_fit_ultrametric(d, control = list(verbose = TRUE))
## Cophenetic correlation:
cor(d, u)
## Plot:
plot(u)
## ("Basically" the same as Figure 1 in de Soete (1986).)

Run the code above in your browser using DataLab