Learn R Programming

GMMinit (version 1.0.0)

runGMM: Run Gaussian Mixture Model (GMM) Clustering with Multiple Initialization Strategies

Description

Applies the Gaussian Mixture Model (GMM) to a dataset using multiple initialization strategies. It runs the Expectation-Maximization (EM) algorithm for each initialization method and returns results for all methods.

Usage

runGMM(x, k, max_iter = 100,
    run_number = 10, smax_iter = 3, 
       s_iter = 10, c_iter = 10, tol = 1e-6, burn_in = 3,verbose = FALSE)

Value

A named list where each element corresponds to an initialization method and contains the results of the EM algorithm:

  • "Random": Results using random initialization.

  • "hierarchical.average": Results using hierarchical clustering (average linkage).

  • "hierarchical.ward": Results using hierarchical clustering (Ward’s method).

  • "kmeans": Results using K-means clustering initialization.

  • "emEM": Results using multi-start Expectation-Maximization (EM).

  • "emAEM": Results using the alternative EM initialization method.

  • "sem": Results using the Stochastic Expectation-Maximization (SEM).

  • "cem": Results using the Classification Expectation-Maximization (CEM).

  • "mclust": Results using model-based clustering from the mclust package.

Each element in the returned list contains:

  • BIC: The Bayesian Information Criterion (BIC) value for model selection.

  • param: A list with the estimated GMM parameters:

    • pi_k: Updated mixing proportions.

    • mu: Updated cluster means.

    • sigma: Updated covariance matrices.

  • cluster_assignments: Cluster labels assigned to each observation.

  • Z: Posterior probability matrix of cluster memberships.

If an initialization method fails, the corresponding list element will contain an error message.

Arguments

x

A numeric matrix or data frame where rows represent observations and columns represent variables.

k

An integer specifying the number of clusters.

max_iter

maximum iteration for running long EM

run_number

number of short em for emEM and emAEM initialization methods(default is 10).

smax_iter

An integer specifying the maximum number of iterations for short EM (default is 3).

s_iter

An integer specifying the number of iterations for the Stochastic Expectation-Maximization (SEM) algorithm (default is 10).

c_iter

An integer specifying the number of iterations for the Classification Expectation-Maximization (CEM) algorithm (default is 10).

tol

A numeric value specifying the convergence tolerance threshold (default is 1e-6).

burn_in

An integer specifying the number of burn-in iterations for stochastic methods (default is 3).

verbose

Logical; if TRUE, prints progress messages.

Details

The runGMM applies multiple initialization strategies for fitting a Gaussian Mixture Model (GMM) using the Expectation-Maximization (EM) algorithm. Each initialization method is evaluated separately, and the results are returned for all tested methods.

See Also

runEM, getInit

Examples

Run this code
# Generate sample data
set.seed(123)
data <- matrix(rnorm(100 * 2), ncol = 2)

# Run GMM clustering with different initialization strategies
results <- runGMM(data, k = 2)
results

Run the code above in your browser using DataLab