The predict.mbcfit function utilizes the parameters of a previously fitted mbcfit model to allocate new data points to estimated clusters. The function performs necessary checks to ensure the mbcfit model returns valid estimates and the dimensionality of the new data aligns with the model.
The mbcfit object must contain a component named params, which is itself a list containing the following necessary elements, for a mixture model with K components:
proportions
A numeric vector of length K, with elements summing to 1, representing cluster proportions.
mean
A numeric matrix of dimensions c(P, K), representing cluster centers.
cov
A numeric array of dimensions c(P, P, K), representing cluster covariance matrices.
Data dimensionality is P, and new data dimensionality must match (ncol(data) must be equal to P) or otherwise the function terminates with an error message.
The predicted clustering is obtained as the MAP estimator using posterior weights of a Gaussian mixture model parametrized at params.
Denoting with \(z(x)\) the predicted cluster label for point \(x\), and with \(\phi\) the (multivariate) Gaussian density:
$$z(x) = \underset{k=\{1,\ldots,K\}}{\arg\,\max} \frac{\pi_k\phi(x, \mu_k, \Sigma_k)}{\sum_{j=1}^K \pi_j\phi(x, \mu_j, \Sigma_j)}$$