pdSpecClust1D
performs clustering of HPD spectral matrices corrupted by noise (e.g. HPD periodograms)
by combining wavelet thresholding and fuzzy clustering in the intrinsic wavelet coefficient domain according to
the following steps:
Transform a collection of noisy HPD spectral matrices to the intrinsic wavelet domain and denoise the
HPD matrix curves by (tree-structured) thresholding of wavelet coefficients with pdSpecEst1D
.
Apply an intrinsic fuzzy c-means algorithm to the coarsest midpoints at scale j = 0
across subjects.
Taking into account the fuzzy cluster assignments in the previous step, apply a weighted fuzzy c-means
algorithm to the nonzero thresholded wavelet coefficients across subjects from scale j = 1
up to j = jmax
.
More details can be found in Chapter 3 of C18pdSpecEst and the accompanying vignettes.
pdSpecClust1D(P, K, jmax, metric = "Riemannian", m = 2, d.jmax = 0.1,
eps = c(1e-04, 1e-04), tau = 0.5, max_iter = 50,
return.centers = FALSE, ...)
a (\(d,d,n,S\))-dimensional array of HPD matrices, corresponding to a collection of sequences of \((d,d)\)-dimensional HPD matrices of length \(n\), with \(n = 2^J\) for some \(J > 0\), for \(S\) different subjects.
the number of clusters, a positive integer larger than 1.
an upper bound on the maximum wavelet scale to be considered in the clustering procedure. If
jmax
is not specified, it is set equal to the maximum (i.e., finest) wavelet scale minus 2.
the metric that the space of HPD matrices is equipped with. The default choice is "Riemannian"
,
but this can also be one of: "logEuclidean"
, "Cholesky"
, "rootEuclidean"
or
"Euclidean"
. Additional details are given below.
the fuzziness parameter for both fuzzy c-means algorithms. m
should be larger or equal to \(1\).
If \(m = 1\) the cluster assignments are no longer fuzzy, i.e., the procedure
performs hard clustering.
a proportion that is used to determine the maximum wavelet scale to be considered in the clustering
procedure. A larger value d.jmax
leads to less wavelet coefficients being taken into account, and therefore
lower computational effort in the procedure. If d.jmax
is not specified, by default d.jmax = 0.1
.
an optional vector with two components determining the stopping criterion. The first step in the cluster procedure
terminates if the (integrated) intrinsic distance between cluster centers is smaller than eps[1]
.
The second step in the cluster procedure terminates if the (integrated) Euclidean distance between cluster centers is smaller
than eps[2]
. By default eps = c(1e-04, 1e-04)
.
an optional argument tuning the weight given to the cluster assignments obtained in the first step of
the clustering algorithm. If tau
is not specified, by default tau = 0.5
.
an optional argument tuning the maximum number of iterations in both the first and second step of the
clustering algorithm, defaults to max_iter = 50
.
should the cluster centers transformed back the space of HPD matrices also be returned?
Defaults to return.centers = FALSE
.
additional arguments passed on to pdSpecEst1D
.
Depending on the input the function returns a list with five or six components:
an (\(S,K\))-dimensional matrix, where the value at position (\(s,k\)) in the matrix corresponds to the probabilistic cluster membership assignment of subject \(s\) with respect to cluster \(k\).
a list of K
wavelet coefficient pyramids, where each pyramid of wavelet
coefficients is associated to a cluster center.
a list of K
arrays of coarse-scale midpoints at scale j = 0
, where each
array is associated to a cluster center.
only available if return.centers = TRUE
, returning a list of K
\((d,d,n)\)-dimensional arrays,
where each array corresponds to a length \(n\) curve of \((d,d)\)-dimensional HPD matrices associated to a cluster center.
the maximum wavelet scale taken into account in the clustering procedure determined by
the input arguments jmax
and d.jmax
.
The input array P
corresponds to a collection of initial noisy HPD spectral estimates of the \((d,d)\)-dimensional
spectral matrix at n
different frequencies, with \(n = 2^J\) for some \(J > 0\), for \(S\) different subjects.
These can be e.g., multitaper HPD periodograms given as output by the function pdPgram
.
First, for each subject \(s = 1,\ldots,S\), thresholded wavelet coefficients in the intrinsic wavelet domain are
calculated by pdSpecEst1D
, see the function documentation for additional details on the wavelet thresholding
procedure.
The maximum wavelet scale taken into account in the clustering procedure is determined by the arguments
jmax
and d.jmax
. The maximum scale is set to the minimum of jmax
and the wavelet
scale \(j\) for which the proportion of nonzero thresholded wavelet coefficients (averaged
across subjects) is smaller than d.jmax
.
The \(S\) subjects are assigned to \(K\) different clusters in a probabilistic fashion according to a
two-step procedure:
In the first step, an intrinsic fuzzy c-means algorithm, with fuzziness parameter \(m\) is applied to the
\(S\) coarsest midpoints at scale j = 0
in the subject-specific midpoint pyramids. Note that the distance
function in the intrinsic c-means algorithm relies on the chosen metric on the space of HPD matrices.
In the second step, a weighted fuzzy c-means algorithm based on the Euclidean
distance function, also with fuzziness parameter \(m\), is applied to the nonzero thresholded wavelet
coefficients of the \(S\) different subjects. The tuning parameter tau
controls the weight given
to the cluster assignments obtained in the first step of the clustering algorithm.
The function computes the forward and inverse intrinsic AI wavelet transform in the space of HPD matrices equipped with
one of the following metrics: (i) the affine-invariant Riemannian metric (default) as detailed in e.g., B09pdSpecEst[Chapter 6]
or PFA05pdSpecEst; (ii) the log-Euclidean metric, the Euclidean inner product between matrix logarithms;
(iii) the Cholesky metric, the Euclidean inner product between Cholesky decompositions; (iv) the Euclidean metric; or
(v) the root-Euclidean metric. The default choice of metric (affine-invariant Riemannian) satisfies several useful properties
not shared by the other metrics, see CvS17pdSpecEst or C18pdSpecEst for more details. Note that this comes
at the cost of increased computation time in comparison to one of the other metrics.
If return.centers = TRUE
, the function also returns the K
HPD spectral matrix curves corresponding to
the cluster centers based on the given metric by applying the intrinsic inverse AI wavelet transform (
InvWavTransf1D
) to the cluster centers in the wavelet domain.
# NOT RUN {
## ARMA(1,1) process: Example 11.4.1 in (Brockwell and Davis, 1991)
Phi1 <- array(c(0.5, 0, 0, 0.6, rep(0, 4)), dim = c(2, 2, 2))
Phi2 <- array(c(0.7, 0, 0, 0.4, rep(0, 4)), dim = c(2, 2, 2))
Theta <- array(c(0.5, -0.7, 0.6, 0.8, rep(0, 4)), dim = c(2, 2, 2))
Sigma <- matrix(c(1, 0.71, 0.71, 2), nrow = 2)
## Generate periodogram data for 10 subjects in 2 groups
pgram <- function(Phi) pdPgram(rARMA(2^9, 2, Phi, Theta, Sigma)$X)$P
P <- array(c(replicate(5, pgram(Phi1)), replicate(5, pgram(Phi2))), dim=c(2,2,2^8,10))
cl <- pdSpecClust1D(P, K = 2, metric = "logEuclidean")
# }
Run the code above in your browser using DataLab