criterion: Criterion for a Loss/Merit Function for Data Given a Permutation

Description

Compute the value for different loss functions $L$ and merit function $M$ for data given a permutation.

Usage

criterion(x, order = NULL, method = NULL, force_loss = FALSE, ...)

Arguments

an object of class dist or a matrix (currently no functions are implemented for array).

order

an object of class ser_permutation suitable for x. If NULL, the identity permutation is used.

method

a character vector with the names of the criteria to be employed, or NULL (default) in which case all available criteria are used.

...

additional parameters passed on to the criterion method.

force_loss

logical; should merit function be converted into loss functions by multiplying with -1?

Value

A named vector of real values.

Details

For a symmetric dissimilarity matrix $D$ with elements $d(i,j)$ where $i, j = 1 \ldots n$, the aim is generally to place low distance values close to the diagonal. The following criteria to judge the quality of a certain permutation of the objects in a dissimilarity matrix are currently implemented (for a more detailed description and an experimental comparison see Hahsler (2017)):

"Gradient_raw", "Gradient_weighted": Gradient measures (Hubert et al 2001). A symmetric dissimilarity matrix where the values in all rows and columns only increase when moving away from the main diagonal is called a perfect anti-Robinson matrix (Robinson 1951). A suitable merit measure which quantifies the divergence of a matrix from the anti-Robinson form is
"AR_events", "AR_deviations": Anti-Robinson events (Chen 2002). An even simpler loss function can be created in the same way as the gradient measures above by concentrating on violations only. $$ L(D) = \sum_{i=1}^n \sum_{i<k<j} f(d_{ik}, d_{ij}) + \sum_{i<k<j} f(d_{kj}, d_{ij}) $$
"RGAR": Relative generalized Anti-Robinson events (Tien et al 2008). Counts Anti-Robinson events in a variable band (window specified by w defaults to the maximum of $n-1$) around the main diagonal and normalizes by the maximum of possible events.
"BAR": Banded Anti-Robinson Form (Earle and Hurley 2015).
"Path_length": Hamiltonian path length (Caraux and Pinloche 2005).
"Lazy_path_length": Lazy path length (Earl and Hurley 2015).
"Inertia": Inertia criterion (Caraux and Pinloche 2005).
"Least_squares": Least squares criterion (Caraux and Pinloche 2005).
"LS": Linear Seriation Criterion (Hubert and Schultz 1976).
"2SUM": 2-Sum Criterion (Barnard, Pothen, and Simon 1993).
"ME", "Moore_stress", "Neumann_stress", "Cor_R": These criteria are defined on general matrices (see below for definitions). The dissimilarity matrix is first converted into a similarity matrix using $S = 1/(1+D)$. If a different transformation is required, then perform the transformation first and supply a matrix instead of a dist object.

For a general matrix $X = x_{ij}$, $i = 1 \ldots n$ and $j = 1 \ldots m$, currently the following loss/merit functions are implemented:

"ME": Measure of Effectiveness (McCormick 1972).

The measure of effectiveness (ME) for matrix $X$, is defined as

$$M(X) = 1/2 \sum_{i=1}^{n} \sum_{j=1}^{m} x_{i,j}(x_{i,j-1}+x_{i,j+1}+x_{i-1,j}+x_{i+1,j})$$

with, by convention

$$x_{0,j}=x_{m+1,j}=x_{i,0}=x_{i,n+1}=0.$$

ME is a merit measure, i.e. a higher ME indicates a better arrangement. Maximizing ME is the objective of the bond energy algorithm (BEA).

"Cor_R"

Weighted correlation coefficient R developed as the Measure of Effectiveness for the Moment Ordering Algorithm (Deutsch and Martin 1971).

R is a merit measure normalized so that its value always lies in $[-1,1]$. For the special case of a square matrix $R=1$ corresponds to only the main diagonal being filled, $R=0$ to a random distribution of value throughout the array, and $R=-1$ to the opposite diagonal only being filled.

"Moore_stress", "Neumann_stress"

Stress (Niermann 2005).

Stress measures the conciseness of the presentation of a matrix/table and can be seen as a purity function which compares the values in a matrix/table with its neighbors. The stress measure used here is computed as the sum of squared distances of each matrix entry from its adjacent entries.

$$ L(X) = \sum_{i=1}^n \sum_{j=1}^m \sigma_{ij} $$

The following types of neighborhoods are available:

Moore:: comprises the eight adjacent entries. $$ \sigma_{ij} = \sum_{k=\max(1,i-1)}^{\min(n,i+1)} \sum_{l=\max(1,j-1)}^{\min(m,j+1)} (x_{ij} - x_{kl})^2 $$
Neumann:: comprises the four adjacent entries. $$ \sigma_{ij} = \sum_{k=\max(1,i-1)}^{\min(n,i+1)} (x_{ij} - x_{kj})^2 + \sum_{l=\max(1,j-1)}^{\min(m,j+1)} (x_{ij} - x_{il})^2 $$

The major difference between the Moore and the Neumann neighborhood is that for the later the contribution of row and column permutations to stress are independent and thus can be optimized independently.

References

Barnard, S.T., A. Pothen, and H. D. Simon (1993): A Spectral Algorithm for Envelope Reduction of Sparse Matrices. In Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, 493--502. Supercomputing '93. New York, NY, USA: ACM.

Caraux, G. and S. Pinloche (2005): Permutmatrix: A Graphical Environment to Arrange Gene Expression Profiles in Optimal Linear Order, Bioinformatics, 21(7), 1280--1281.

Chen, C.-H. (2002): Generalized association plots: Information visualization via iteratively generated correlation matrices, Statistica Sinica, 12(1), 7--29.

Deutsch, S.B. and J.J. Martin (1971): An ordering algorithm for analysis of data arrays. Operational Research, 19(6), 1350--1362.

Earle, D. and C.B. Hurley (2015): Advances in Dendrogram Seriation for Application to Visualization. Journal of Computational and Graphical Statistics, 24(1), 1--25.

Hahsler, M. (2017): An experimental comparison of seriation methods for one-mode two-way data. European Journal of Operational Research, 257, 133--143.

Hubert, L. and J. Schultz (1976): Quadratic Assignment as a General Data Analysis Strategy. British Journal of Mathematical and Statistical Psychology, 29(2). Blackwell Publishing Ltd. 190--241.

Hubert, L., P. Arabie, and J. Meulman (2001): Combinatorial Data Analysis: Optimization by Dynamic Programming. Society for Industrial Mathematics.

Niermann, S. (2005): Optimizing the Ordering of Tables With Evolutionary Computation, The American Statistician, 59(1), 41--46.

McCormick, W.T., P.J. Schweitzer and T.W. White (1972): Problem decomposition and data reorganization by a clustering technique, Operations Research, 20(5), 993-1009.

Robinson, W.S. (1951): A method for chronologically ordering archaeological deposits, American Antiquity, 16, 293--301.

Tien, Y-J., Yun-Shien Lee, Han-Ming Wu and Chun-Houh Chen (2008): Methods for simultaneously identifying coherent local clusters with smooth global patterns in gene expression profiles, BMC Bioinformatics, 9(155), 1--16.

Examples

Run this code

## create random data and calculate distances
m <- matrix(runif(20),ncol=2)
d <- dist(m)

## get an order for rows (optimal for the least squares criterion)
o <- seriate(d, method = "MDS")
o

## compare the values for all available criteria
rbind(
    unordered = criterion(d),
    ordered = criterion(d, o)
)

## compare RGAR by window size (from local to global)
w <- 2:(nrow(m)-1)
RGAR <- sapply(w, FUN = function (w)
  criterion(d, o, method="RGAR", w = w))
plot(w, RGAR, type = "b", ylim = c(0,1),
  xlab = "Windows size (w)", main = "RGAR by window size")

Run the code above in your browser using DataLab