Fast MDS uses recursive programming in combination with a divide
and conquer strategy in order to obtain an MDS configuration for a given
large data set x.
fast_mds(x, l, s_points, r, n_cores)Returns a list containing the following elements:
A matrix that consists of \(n\) individuals (rows)
and r variables (columns) corresponding to the principal coordinates. Since
we are performing a dimensionality reduction, r\(<<k\)
The first r largest eigenvalues:
\(\bar{\lambda}_i, i \in \{1, \dots, r\} \), where
\(\bar{\lambda}_i = 1/p \sum_{j=1}^{p}\lambda_i^j/n_j\),
being \(\lambda_i^j\) the \(i-th\) eigenvalue from partition \(j\)
and \(n_j\) the size of the partition \(j\).
A matrix with \(n\) individuals (rows) and \(k\) variables (columns).
The size for which classical MDS can be computed efficiently
(using cmdscale function). It means that if \(\bar{l}\) is the limit
size for which classical MDS is applicable, then l\(\leq \bar{l}\).
Number of points used to align the MDS solutions obtained by
the division of x into \(p\) submatrices. Recommended value: 5·r.
Number of principal coordinates to be extracted.
Number of cores wanted to use to run the algorithm.
Fast MDS randomly divides the whole sample data set, x, of size \(n\)
into \(p=\)l/s_points data subsets, where l \(\leq \bar{l}\) being \(\bar{l}\)
the limit size for which classical MDS is applicable. Each one of the \(p\) data subsets
has size \(\tilde{n} = n/p\). If \(\tilde{n} \leq \code{l}\) then classical MDS is applied
to each data subset. Otherwise, fast MDS is recursively applied.
In either case, a final MDS configuration is obtained for each data subset.
In order to align all the partial solutions, a small subset of size s_points
is randomly selected from each data subset. They are joined
to form an alignment set, over which classical MDS is performed giving rise to an
alignment configuration. Every data subset shares s_points points with the alignment
set. Therefore every MDS configuration can be aligned with the alignment configuration
using a Procrustes transformation.
Delicado P. and C. Pachón-García (2021). Multidimensional Scaling for Big Data. https://arxiv.org/abs/2007.11919.
Yang, T., J. Liu, L. McMillan and W.Wang (2006). A fast approximation to multidimensional scaling. In Proceedings of the ECCV Workshop on Computation Intensive Methods for Computer Vision (CIMCV).
Borg, I. and P. Groenen (2005). Modern Multidimensional Scaling: Theory and Applications. Springer.
set.seed(42)
x <- matrix(data = rnorm(4 * 10000), nrow = 10000) %*% diag(c(9, 4, 1, 1))
mds <- fast_mds(x = x, l = 200, s_points = 5 * 2, r = 2, n_cores = 1)
head(mds$points)
mds$eigen
Run the code above in your browser using DataLab