seriation - Infrastructure for Ordering Objects Using Seriation - R package

This package provides the infrastructure for ordering objects with an implementation of several seriation/sequencing/ordination techniques to reorder matrices, dissimilarity matrices, and dendrograms (see below for a full list). Also provides (optimally) reordered heatmaps, color images and clustering visualizations like dissimilarity plots, and visual assessment of cluster tendency plots (VAT and iVAT).

Installation

Stable CRAN version: install from within R with

install.packages("seriation")

Current development version: Download package from AppVeyor or install from GitHub (needs devtools).

library("devtools")
install_github("mhahsler/seriation")

Usage

Load library, read data and calculate distances. Then use default seriation.

library(seriation)
data("iris")
x <- as.matrix(iris[-5])
x <- x[sample(1:nrow(x)),]

d <- dist(x)
order <- seriate(d)
order
object of class ‘ser_permutation’, ‘list’
contains permutation vectors for 1-mode data

  vector length seriation method
1           150             ARSA

Compare quality.

rbind(
 random = criterion(d),
 reordered = criterion(d, order)
)
          AR_events AR_deviations       RGAR Gradient_raw Gradient_weighted Path_length
random       550620    948833.712 0.49938328          741         -1759.954   392.77766
reordered     54846      9426.094 0.04974243       992214       1772123.418    83.95758
            Inertia Least_squares       ME Moore_stress Neumann_stress     2SUM      LS
random    214602194      78852819 291618.0    927570.00     461133.357 29954845 5669489
reordered 356945979      76487641 402332.1     13593.32       5274.093 17810802 4486900

Available Seriation Method

The following methods are available for dissimilarity data:

  • ARSA - Simulated annealing (linear seriation)
  • Branch-and-bound to minimize the unweighted/weighted column gradient
  • DendSer - Dendrogram seriation heuristic to optimize various criteria
  • GA - Genetic algorithm with warm start to optimize various criteria
  • GW - Hierarchical clustering reordered by Gruvaeus and Wainer heuristic
  • HC - Hierarchical clustering (single link, avg. link, complete link)
  • Identity permutation
  • MDS - Multidimensional scaling (metric, non-metric, angle)
  • OLO - Hierarchical clustering with optimal leaf ordering
  • OPTICS - Ordering points to identify the clustering structure.
  • QAP - Quadratic assignment problem heuristic (2-SUM, linear seriation, inertia, banded anti-Robinson form)
  • R2E - Rank-two ellipse seriation
  • Random permutation
  • Spectral seriation (unnormalized, normalized)
  • SPIN - Sorting points into neighborhoods (neighborhood algorithm, side-to-site algorithm)
  • TSP - Traveling sales person solver to minimize the Hamiltonian path length
  • TSNE - Order of the 1D t-distributed stochastic neighbor embedding (t-SNE)
  • UMAP - Order of the 1D embedding produced by uniform manifold approximation and projection
  • VAT - Order of the visual assessment of clustering tendency ordering

A detailed comparison of the methods is available in the paper An experimental comparison of seriation methods for one-mode two-way data. (read preprint).

The following methods are available for matrices:

  • BEA - Bond Energy Algorithm to maximize the measure of effectiveness (ME)
  • Identity permutation
  • PCA - First principal component or angle on the projection on the first two principal components
  • Random permutation
  • TSP - Traveling sales person solver to maximize ME

References

Copy Link

Version

Down Chevron

Install

install.packages('seriation')

Monthly Downloads

23,095

Version

1.3.1

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

October 16th, 2021

Functions in seriation (1.3.1)