pdkMeans: K-means clustering for HPD matrices

Description

pdkMeans performs (fuzzy) k-means clustering for collections of HPD matrices, such as covariance or spectral density matrices, based on a number of different metrics in the space of HPD matrices.

Usage

pdkMeans(X, K, metric = "Riemannian", m = 1, eps = 1e-05,
  max_iter = 100, centroids)

Arguments

a (\(d,d,S\))-dimensional array of (\(d,d\))-dimensional HPD matrices for \(S\) different subjects. Also accepts a (\(d,d,n,S\))-dimensional array, which is understood to be an array of \(n\)-dimensional sequences of (\(d,d\))-dimensional HPD matrices for \(S\) different subjects.

the number of clusters, a positive integer larger than 1.

metric

the metric that the space of HPD matrices is equipped with. The default choice is "Riemannian", but this can also be one of: "logEuclidean", "Cholesky", "rootEuclidean" or "Euclidean". Additional details are given below.

a fuzziness parameter larger or equal to \(1\). If \(m = 1\) the cluster assignments are no longer fuzzy, i.e., the procedure performs hard clustering. Defaults to m = 1.

eps

an optional tolerance parameter determining the stopping criterion. The k-means algorithm terminates if the intrinsic distance between cluster centers is smaller than eps, defaults to eps = 1e-05.

max_iter

an optional parameter tuning the maximum number of iterations in the k-means algorithm, defaults to max_iter = 100.

centroids

an optional (\(d,d,K\))- or (\(d,d,n,K\))-dimensional array depending on the input array X specifying the initial cluster centroids. If not specified, K initial cluster centroids are randomly sampled without replacement from the input array X.

Value

Returns a list with two components:

cl.assignments: an (\(S,K\))-dimensional matrix, where the value at position (\(s,k\)) in the matrix corresponds to the (probabilistic or binary) cluster membership assignment of subject \(s\) with respect to cluster \(k\).
cl.centroids: either a (\(d,d,K\))- or (\(d,d,n,K\))-dimensional array depending on the input array X corresponding respectively to the K \((d,d)\)- or (\(d,d,n\))-dimensional final cluster centroids.

Details

The input array X corresponds to a collection of \((d,d)\)-dimensional HPD matrices for \(S\) different subjects. If the fuzziness parameter satisfies m > 1, the \(S\) subjects are assigned to \(K\) different clusters in a probabilistic fashion according to a fuzzy k-means algorithm as detailed in classical texts, such as BE81pdSpecEst. If m = 1, the \(S\) subjects are assigned to the \(K\) clusters in a non-probabilistic fashion according to a standard (hard) k-means algorithm. If not specified by the user, the \(K\) cluster centers are initialized by random sampling without replacement from the input array of HPD matrices X. The distance measure in the (fuzzy) k-means algorithm is induced by the metric on the space of HPD matrices specified by the user. By default, the space of HPD matrices is equipped with (i) the affine-invariant Riemannian metric (metric = 'Riemannian') as detailed in e.g., B09pdSpecEst[Chapter 6] or PFA05pdSpecEst. Instead, this can also be one of: (ii) the log-Euclidean metric (metric = 'logEuclidean'), the Euclidean inner product between matrix logarithms; (iii) the Cholesky metric (metric = 'Cholesky'), the Euclidean inner product between Cholesky decompositions; (iv) the Euclidean metric (metric = 'Euclidean'); or (v) the root-Euclidean metric (metric = 'rootEuclidean'). The default choice of metric (affine-invariant Riemannian) satisfies several useful properties not shared by the other metrics, see e.g., C18pdSpecEst for more details. Note that this comes at the cost of increased computation time in comparison to one of the other metrics.

References

Examples

Run this code

# NOT RUN {
## Generate 20 random HPD matrices in 2 groups
m <- function(rescale){
 x <- matrix(complex(real = rescale * rnorm(9), imaginary = rescale * rnorm(9)), nrow = 3)
 t(Conj(x)) %*% x
}
X <- array(c(replicate(10, m(0.25)), replicate(10, m(1))), dim = c(3, 3, 20))

## Compute fuzzy k-means cluster assignments
cl <- pdkMeans(X, K = 2, m = 2)$cl.assignments

# }

Run the code above in your browser using DataLab