pdkMeans
performs (fuzzy) k-means clustering for collections of HPD matrices, such as covariance or
spectral density matrices, based on a number of different metrics in the space of HPD matrices.
pdkMeans(X, K, metric = "Riemannian", m = 1, eps = 1e-05,
max_iter = 100, centroids)
a (\(d,d,S\))-dimensional array of (\(d,d\))-dimensional HPD matrices for \(S\) different subjects. Also accepts a (\(d,d,n,S\))-dimensional array, which is understood to be an array of \(n\)-dimensional sequences of (\(d,d\))-dimensional HPD matrices for \(S\) different subjects.
the number of clusters, a positive integer larger than 1.
the metric that the space of HPD matrices is equipped with. The default choice is "Riemannian"
,
but this can also be one of: "logEuclidean"
, "Cholesky"
, "rootEuclidean"
or
"Euclidean"
. Additional details are given below.
a fuzziness parameter larger or equal to \(1\). If \(m = 1\) the cluster assignments are no longer fuzzy,
i.e., the procedure performs hard clustering. Defaults to m = 1
.
an optional tolerance parameter determining the stopping criterion. The k-means algorithm
terminates if the intrinsic distance between cluster centers is smaller than eps
, defaults to eps = 1e-05
.
an optional parameter tuning the maximum number of iterations in the
k-means algorithm, defaults to max_iter = 100
.
an optional (\(d,d,K\))- or (\(d,d,n,K\))-dimensional array depending on the input array X
specifying the initial cluster centroids. If not specified, K
initial cluster centroids are randomly sampled without
replacement from the input array X
.
Returns a list with two components:
an (\(S,K\))-dimensional matrix, where the value at position (\(s,k\)) in the matrix corresponds to the (probabilistic or binary) cluster membership assignment of subject \(s\) with respect to cluster \(k\).
either a (\(d,d,K\))- or (\(d,d,n,K\))-dimensional array depending on the input array X
corresponding respectively to the K
\((d,d)\)- or (\(d,d,n\))-dimensional final cluster centroids.
The input array X
corresponds to a collection of \((d,d)\)-dimensional HPD matrices
for \(S\) different subjects. If the fuzziness parameter satisfies m > 1
, the \(S\) subjects are assigned to
\(K\) different clusters in a probabilistic fashion according to a fuzzy k-means algorithm as detailed in classical texts,
such as BE81pdSpecEst. If m = 1
, the \(S\) subjects are assigned to the \(K\) clusters in a non-probabilistic
fashion according to a standard (hard) k-means algorithm. If not specified by the user, the \(K\) cluster
centers are initialized by random sampling without replacement from the input array of HPD matrices X
.
The distance measure in the (fuzzy) k-means algorithm is induced by the metric on the space of HPD matrices specified by the user.
By default, the space of HPD matrices is equipped with (i) the affine-invariant Riemannian metric (metric = 'Riemannian'
)
as detailed in e.g., B09pdSpecEst[Chapter 6] or PFA05pdSpecEst. Instead, this can also be one of:
(ii) the log-Euclidean metric (metric = 'logEuclidean'
), the Euclidean inner product between matrix logarithms;
(iii) the Cholesky metric (metric = 'Cholesky'
), the Euclidean inner product between Cholesky decompositions; (iv) the
Euclidean metric (metric = 'Euclidean'
); or (v) the root-Euclidean metric (metric = 'rootEuclidean'
). The default
choice of metric (affine-invariant Riemannian) satisfies several useful properties not shared by the other metrics, see e.g.,
C18pdSpecEst for more details. Note that this comes at the cost of increased computation time in comparison to one
of the other metrics.
# NOT RUN {
## Generate 20 random HPD matrices in 2 groups
m <- function(rescale){
x <- matrix(complex(real = rescale * rnorm(9), imaginary = rescale * rnorm(9)), nrow = 3)
t(Conj(x)) %*% x
}
X <- array(c(replicate(10, m(0.25)), replicate(10, m(1))), dim = c(3, 3, 20))
## Compute fuzzy k-means cluster assignments
cl <- pdkMeans(X, K = 2, m = 2)$cl.assignments
# }
Run the code above in your browser using DataLab