criterion: Criterion for a Loss/Merit Function for Data Given a Permutation

Description

Compute the value for different loss functions $L$ and merit function $M$ for data given a permutation.

Usage

criterion(x, order = NULL, method = NULL, force_loss = FALSE, ...)

Arguments

an object of class dist or a matrix (currently no functions are implemented for array).

order

an object of class ser_permutation suitable for x. If NULL, the identity permutation is used.

method

a character vector with the names of the criteria to be employed, or NULL (default) in which case all available criteria are used.

...

additional parameters passed on to the criterion method.

force_loss

logical; should merit function be converted into loss functions by multiplying with -1?

Value

A named vector of real values.

Details

For a symmetric dissimilarity matrix $D$ with elements $d(i,j)$ where $i, j = 1 \ldots n$, the aim is generally to place low distance values close to the diagonal. The following criteria to judge the quality of a certain permutation of the objects in a dissimilarity matrix are currently implemented (for a more detailed description and an experimental comparison see Hahsler (2017)):

"Gradient_raw", "Gradient_weighted"

Gradient measures (Hubert et al 2001). A symmetric dissimilarity matrix where the values in all rows and columns only increase when moving away from the main diagonal is called a perfect anti-Robinson matrix (Robinson 1951). A suitable merit measure which quantifies the divergence of a matrix from the anti-Robinson form is

$$ M(D) = \sum_{i=1}^n \sum_{i<k<j} f(d_{ij}, d_{ik}) + \sum_{i<k<j} f(d_{ij}, d_{kj})$$ where $f(.,.)$ is a function which defines how a violation or satisfaction of a gradient condition for an object triple ($O_i, O_k, O_j$) is counted.

Hubert et al (2001) suggest two functions. The first function is given by: $$f(z,y) = sign(y-z) = +1 if z < y; 0 if z = y; and -1 if z > y.$$

It results in raw number of triples satisfying the gradient constraints minus triples which violate the constraints.

The second function is defined as: $$f(z,y) = |y-z| sign(y-z) = y-z$$ It weights the each satisfaction or violation by the difference by its magnitude given by the absolute difference between the values.

"AR_events", "AR_deviations"

Anti-Robinson events (Chen 2002). An even simpler loss function can be created in the same way as the gradient measures above by concentrating on violations only. $$ L(D) = \sum_{i=1}^n \sum_{i<k<j} f(d_{ik}, d_{ij}) + \sum_{i<k<j} f(d_{kj}, d_{ij}) $$

To only count the violations we use $$ f(z, y) = I(z, y) = 1 if z < y and 0 otherwise. $$

$I(\cdot)$ is an indicator function returning 1 only for violations. Chen (2002) presented a formulation for an equivalent loss function and called the violations anti-Robinson events and also introduced a weighted versions of the loss function resulting in $$ f(z, y) = |y-z|I(z, y) $$ using the absolute deviations as weights.

"RGAR"

Relative generalized Anti-Robinson events (Tien et al 2008). Counts Anti-Robinson events in a variable band (window specified by w defaults to the maximum of $n-1$) around the main diagonal and normalizes by the maximum of possible events.

$$ L(D) = 1/m \sum_{i=1}^n \sum_{(i-w)\le j<k<i} I(d_{ij} < d_{ik}) + \sum_{i<j<k\le(i+w))} I(d_{ij} > d_{ik}) $$

where $m=(2/3-n)w + nw^2 - 2/3 w^3$, the maximal number of possible anti-Robinson events in the window. The window size $w$ represents the number of neighboring objects (number of entries from the diagonal of the distance matrix) are considered. The window size is $2 \le w < n$, where smaller values result in focusing on the local structure while larger values look at the global structure. Alternatively, pct can be used instead of w to specify the window as a percentage of $n$. relative=FALSE can be to get the GAR, i.e., the absolute number of AR events in the window.

"BAR"

Banded Anti-Robinson Form (Earle and Hurley 2015).

Simplified measure for closeness to the anti-Robinson form in a band of size $b$ with $1 <= b < n$ around the diagonal.

$$ L(D) = \sum_{|i-j|<=b} (b+1-|i-j|) d_{ij} $$

For $b=1$ the measure reduces to the Hamiltonian path length. For $b=n-1$ the measure is equivalent to ARc defined (Earle and Hurley, 2015). Note that ARc is equivalent to the Linear Seriation criterion (scaled by 1/2).

$b$ defaults to a band of 20% of $n$.

"Path_length"

Hamiltonian path length (Caraux and Pinloche 2005).

The order of the objects in a dissimilarity matrix corresponds to a path through a graph where each node represents an object and is visited exactly once, i.e., a Hamilton path. The length of the path is defined as the sum of the edge weights, i.e., dissimilarities.

$$L(D) = \sum_{i=1}^{n-1} d_{i,i+1}$$

The length of the Hamiltonian path is equal to the value of the minimal span loss function (as used by Chen 2002). Both notions are related to the traveling salesperson problem (TSP).

If order is not unique or there are non-finite distance values NA is returned.

"Lazy_path_length"

Lazy path length (Earl and Hurley 2015).

A weighted version of the Hamiltonian path criterion. This loss function postpones larger distances to later in the order (i.e., a lazy traveling sales person).

$$L(D) = \sum_{i=1}^{n-1} (n-i) d_{i,i+1}$$

Earl and Hurley (2015) proposed this criterion for reordering in visualizations to concentrate on closer objects first.

"Inertia"

Inertia criterion (Caraux and Pinloche 2005).

Measures the moment of the inertia of dissimilarity values around the diagonal as

$$M(D) = \sum_{i=1}^n \sum_{j=1}^n d(i,j)|i-j|^2$$

$|i-j|$ is used as a measure for the distance to the diagonal and $d(i,j)$ gives the weight. This criterion gives higher weight to values farther away from the diagonal. It increases with quality.

"Least_squares"

Least squares criterion (Caraux and Pinloche 2005).

The sum of squares of deviations between the dissimilarities and rank differences (in the matrix) between two elements: $$L(D) = \sum_{i=1}^n \sum_{j=1}^n (d(i,j) - |i-j|)^2,$$ where $d(i,j)$ is an element of the dissimilarity matrix $D$ and $|i-j|$ is the rank difference between the objects.

Note that if Euclidean distance is used to calculate $D$ from a data matrix $X$, the order of the elements in $X$ by projecting them on the first principal component of $X$ minimizes this criterion. The least squares criterion is related to unidimensional scaling.

"LS"

Linear Seriation Criterion (Hubert and Schultz 1976).

Weights the distances with the absolute rank differences.

$$L(D) \sum_{i,j=1}^n d(i,j) (-|i-j|)$$

"2SUM"

2-Sum Criterion (Barnard, Pothen, and Simon 1993).

The 2-Sum loss criterion multiplies the similarity between objects with the squared rank differences.

$$L(D) \sum_{i,j=1}^n 1/(1+d(i,j)) (i-j)^2,$$

where $s(i,j) = 1/(1+d(i,j))$ represents the similarity between objects $i$ and $j$.

"ME", "Moore_stress", "Neumann_stress", "Cor_R"

These criteria are defined on general matrices (see below for definitions). The dissimilarity matrix is first converted into a similarity matrix using $S = 1/(1+D)$. If a different transformation is required, then perform the transformation first and supply a matrix instead of a dist object.

For a general matrix $X = x_{ij}$, $i = 1 \ldots n$ and $j = 1 \ldots m$, currently the following loss/merit functions are implemented:

"ME"

Measure of Effectiveness (McCormick 1972).

The measure of effectiveness (ME) for matrix $X$, is defined as

$$M(X) = 1/2 \sum_{i=1}^{n} \sum_{j=1}^{m} x_{i,j}(x_{i,j-1}+x_{i,j+1}+x_{i-1,j}+x_{i+1,j})$$

with, by convention

$$x_{0,j}=x_{m+1,j}=x_{i,0}=x_{i,n+1}=0.$$

ME is a merit measure, i.e. a higher ME indicates a better arrangement. Maximizing ME is the objective of the bond energy algorithm (BEA).

"Cor_R"

Weighted correlation coefficient R developed as the Measure of Effectiveness for the Moment Ordering Algorithm (Deutsch and Martin 1971).

R is a merit measure normalized so that its value always lies in $[-1,1]$. For the special case of a square matrix $R=1$ corresponds to only the main diagonal being filled, $R=0$ to a random distribution of value throughout the array, and $R=-1$ to the opposite diagonal only being filled.

"Moore_stress", "Neumann_stress"

Stress (Niermann 2005).

Stress measures the conciseness of the presentation of a matrix/table and can be seen as a purity function which compares the values in a matrix/table with its neighbors. The stress measure used here is computed as the sum of squared distances of each matrix entry from its adjacent entries.

$$ L(X) = \sum_{i=1}^n \sum_{j=1}^m \sigma_{ij} $$

The following types of neighborhoods are available:

Moore:: comprises the eight adjacent entries. $$ \sigma_{ij} = \sum_{k=\max(1,i-1)}^{\min(n,i+1)} \sum_{l=\max(1,j-1)}^{\min(m,j+1)} (x_{ij} - x_{kl})^2 $$
Neumann:: comprises the four adjacent entries. $$ \sigma_{ij} = \sum_{k=\max(1,i-1)}^{\min(n,i+1)} (x_{ij} - x_{kj})^2 + \sum_{l=\max(1,j-1)}^{\min(m,j+1)} (x_{ij} - x_{il})^2 $$

The major difference between the Moore and the Neumann neighborhood is that for the later the contribution of row and column permutations to stress are independent and thus can be optimized independently.

References

Barnard, S.T., A. Pothen, and H. D. Simon (1993): A Spectral Algorithm for Envelope Reduction of Sparse Matrices. In Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, 493--502. Supercomputing '93. New York, NY, USA: ACM.

Caraux, G. and S. Pinloche (2005): Permutmatrix: A Graphical Environment to Arrange Gene Expression Profiles in Optimal Linear Order, Bioinformatics, 21(7), 1280--1281.

Chen, C.-H. (2002): Generalized association plots: Information visualization via iteratively generated correlation matrices, Statistica Sinica, 12(1), 7--29.

Deutsch, S.B. and J.J. Martin (1971): An ordering algorithm for analysis of data arrays. Operational Research, 19(6), 1350--1362.

Earle, D. and C.B. Hurley (2015): Advances in Dendrogram Seriation for Application to Visualization. Journal of Computational and Graphical Statistics, 24(1), 1--25.

Hahsler, M. (2017): An experimental comparison of seriation methods for one-mode two-way data. European Journal of Operational Research, 257, 133--143.

Hubert, L. and J. Schultz (1976): Quadratic Assignment as a General Data Analysis Strategy. British Journal of Mathematical and Statistical Psychology, 29(2). Blackwell Publishing Ltd. 190--241.

Hubert, L., P. Arabie, and J. Meulman (2001): Combinatorial Data Analysis: Optimization by Dynamic Programming. Society for Industrial Mathematics.

Niermann, S. (2005): Optimizing the Ordering of Tables With Evolutionary Computation, The American Statistician, 59(1), 41--46.

McCormick, W.T., P.J. Schweitzer and T.W. White (1972): Problem decomposition and data reorganization by a clustering technique, Operations Research, 20(5), 993-1009.

Robinson, W.S. (1951): A method for chronologically ordering archaeological deposits, American Antiquity, 16, 293--301.

Tien, Y-J., Yun-Shien Lee, Han-Ming Wu and Chun-Houh Chen (2008): Methods for simultaneously identifying coherent local clusters with smooth global patterns in gene expression profiles, BMC Bioinformatics, 9(155), 1--16.

Examples

Run this code

# NOT RUN {
## create random data and calculate distances
m <- matrix(runif(20),ncol=2)
d <- dist(m)

## get an order for rows (optimal for the least squares criterion)
o <- seriate(d, method = "MDS")
o

## compare the values for all available criteria
rbind(
    unordered = criterion(d),
    ordered = criterion(d, o)
)

## compare RGAR by window size (from local to global)
w <- 2:(nrow(m)-1)
RGAR <- sapply(w, FUN = function (w)
  criterion(d, o, method="RGAR", w = w))
plot(w, RGAR, type = "b", ylim = c(0,1),
  xlab = "Windows size (w)", main = "RGAR by window size")
# }

Run the code above in your browser using DataLab