fit_ultrametric_target: Fit Dissimilarities to a Hierarchy

Description

Find the ultrametric from a target equivalence class of hierarchies which minimizes weighted Euclidean or Manhattan dissimilarity to a given dissimilarity object.

Usage

ls_fit_ultrametric_target(x, y, weights = 1)
l1_fit_ultrametric_target(x, y, weights = 1)

Arguments

a dissimilarity object inheriting from class "dist".

a target hierarchy.

weights

a numeric vector or matrix with non-negative weights for obtaining a weighted fit. If a matrix, its numbers of rows and columns must be the same as the number of objects in x. Otherwise, it is recycled to the number of elements in x.

Value

An object of class "cl_ultrametric" containing the optimal ultrametric distances.

Details

The target equivalence class consists of all dendrograms for which the corresponding \(n\)-trees are the same as the one corresponding to y. I.e., all splits are the same as for y, and optimization is over the height of the splits.

The criterion function to be optimized over all ultrametrics from the equivalence class is \(\sum w_{ij} |x_{ij} - u_{ij}|^p\), where \(p = 2\) in the Euclidean and \(p = 1\) in the Manhattan case, respectively.

The optimum can be computed as follows. Suppose split \(s\) joins object classes \(A\) and \(B\). As the ultrametric dissimilarities of all objects in \(A\) to all objects in \(B\) must be the same value, say, \(u_{A,B} = u_s\), the contribution from the split to the criterion function is of the form \(f_s(u_s) = \sum_{i \in A, j \in B} w_{ij} |x_{ij} - u_s|^p\). We need to minimize \(\sum_s f_s(u_s)\) under the constraint that the \(u_s\) form a non-decreasing sequence, which is accomplished by using the Pool Adjacent Violator Algorithm (PAVA) using the weighted mean (\(p = 2\)) or weighted median (\(p = 1\)) for solving the blockwise optimization problems.

Examples

Run this code

# NOT RUN {
data("Phonemes")
## Note that the Phonemes data set has the consonant misclassification
## probabilities, i.e., the similarities between the phonemes.
d <- as.dist(1 - Phonemes)
## Find the maximal dominated and miminal dominating ultrametrics by
## hclust() with single and complete linkage:
y1 <- hclust(d, "single")
y2 <- hclust(d, "complete")
## Note that these are quite different:
cl_dissimilarity(y1, y2, "gamma")
## Now find the L2 optimal members of the respective dendrogram
## equivalence classes.
u1 <- ls_fit_ultrametric_target(d, y1)
u2 <- ls_fit_ultrametric_target(d, y2)
## Compute the L2 optimal ultrametric approximation to d.
u <- ls_fit_ultrametric(d)
## And compare ...
cl_dissimilarity(cl_ensemble(Opt = u, Single = u1, Complete = u2), d)
## The solution obtained via complete linkage is quite close:
cl_agreement(u2, u, "cophenetic")
# }