identifyLCAMixture: Solve label switching and identify mixture for a mixture of LCA models.

Description

Clustering of the draws in the point process representation (PPR) using \(k\)-means clustering.

identifyLCAMixture(Func, Mu, Phi, Eta, S, centers)

A named list containing:

"S": reordered assignments.
"Mu": reordered Mu matrix.
"Phi": reordered Phi matrix.
"Eta": reordered weights.
"non_perm_rate": proportion of draws where the clustering did not result in a permutation and hence no relabeling could be performed; this is the proportion of draws discarded.

Func: A numeric array of dimension \(M \times d \times K\); data for clustering in the PPR.
Mu: A numeric array of dimension \(M \times r \times K\); draws of cluster means.
Phi: A numeric array of dimension \(M \times K \times r\); draws of precisions.
Eta: A numeric array of dimension \(M \times K\); draws of cluster sizes.
S: A numeric matrix of dimension \(M \times N\); draws of cluster assignments.
centers: An integer or a numeric matrix of dimension \(K \times d\); used to initialize stats::kmeans().

The following steps are implemented:

A functional of the draws of the component-specific parameters (Func) is passed to the function. The functionals of each component and iteration are stacked on top of each other in order to obtain a matrix where each row corresponds to the functional of one component.
The functionals are clustered into \(K_+\) clusters using \(k\)-means clustering. For each functional a group label is obtained.
The obtained labels of the functionals are used to construct a classification for each MCMC iteration. Those classifications which are a permutation of \((1,\ldots,K_+)\) are used to reorder the Mu and Eta draws and the assignment matrix S. This results in an identified mixture model.
Note that only iterations resulting in permutations are used for parameter estimation and deriving the final partition. Those MCMC iterations where the obtained classifications of the functionals are not a permutation of \((1,\ldots,K_+)\) are discarded as no unique assignment of functionals to components can be made. If the non-permutation rate, i.e. the proportion of MCMC iterations where the obtained classifications of the functionals are not a permutation, is high, this is an indication of a poor clustering solution, as the functionals are not clearly separated.