Applies the Gaussian Mixture Model (GMM) to a dataset using multiple initialization strategies. It runs the Expectation-Maximization (EM) algorithm for each initialization method and returns results for all methods.
runGMM(x, k, max_iter = 100,
run_number = 10, smax_iter = 3,
s_iter = 10, c_iter = 10, tol = 1e-6, burn_in = 3,verbose = FALSE)A named list where each element corresponds to an initialization method and contains the results of the EM algorithm:
"Random": Results using random initialization.
"hierarchical.average": Results using hierarchical clustering (average linkage).
"hierarchical.ward": Results using hierarchical clustering (Ward’s method).
"kmeans": Results using K-means clustering initialization.
"emEM": Results using multi-start Expectation-Maximization (EM).
"emAEM": Results using the alternative EM initialization method.
"sem": Results using the Stochastic Expectation-Maximization (SEM).
"cem": Results using the Classification Expectation-Maximization (CEM).
"mclust": Results using model-based clustering from the mclust package.
Each element in the returned list contains:
BIC: The Bayesian Information Criterion (BIC) value for model selection.
param: A list with the estimated GMM parameters:
pi_k: Updated mixing proportions.
mu: Updated cluster means.
sigma: Updated covariance matrices.
cluster_assignments: Cluster labels assigned to each observation.
Z: Posterior probability matrix of cluster memberships.
If an initialization method fails, the corresponding list element will contain an error message.
A numeric matrix or data frame where rows represent observations and columns represent variables.
An integer specifying the number of clusters.
maximum iteration for running long EM
number of short em for emEM and emAEM initialization methods(default is 10).
An integer specifying the maximum number of iterations for short EM (default is 3).
An integer specifying the number of iterations for the Stochastic Expectation-Maximization (SEM) algorithm (default is 10).
An integer specifying the number of iterations for the Classification Expectation-Maximization (CEM) algorithm (default is 10).
A numeric value specifying the convergence tolerance threshold (default is 1e-6).
An integer specifying the number of burn-in iterations for stochastic methods (default is 3).
Logical; if TRUE, prints progress messages.
The runGMM applies multiple initialization strategies for fitting a Gaussian Mixture Model (GMM) using the Expectation-Maximization (EM) algorithm.
Each initialization method is evaluated separately, and the results are returned for all tested methods.
runEM, getInit
# Generate sample data
set.seed(123)
data <- matrix(rnorm(100 * 2), ncol = 2)
# Run GMM clustering with different initialization strategies
results <- runGMM(data, k = 2)
results
Run the code above in your browser using DataLab