Learn R Programming

GMMinit (version 1.0.0)

runEM: Expectation-Maximization (EM) Algorithm for Gaussian Mixture Models

Description

Implements the Expectation-Maximization (EM) algorithm for fitting Gaussian Mixture Models (GMMs). It iteratively updates the model parameters (means, covariances, and mixing proportions) until convergence.

Usage

runEM(x, param, max_iter = 100, tol = 1e-5)

Value

A list containing the following components:

  • BIC: Bayesian Information Criterion (BIC) value for model selection.

  • param: A list containing the estimated model parameters:

    • param[[1]]: Updated mixing proportions.

    • param[[2]]: Updated cluster means.

    • param[[3]]: Updated covariance matrices.

  • cluster_labels: Cluster assignments (most probable cluster for each observation).

  • Z: Posterior probability matrix (\(\gamma\)), rounded to 4 decimal places.

Arguments

x

A numeric matrix or data frame where rows represent observations and columns represent variables.

param

A list containing the initial parameters for the EM algorithm:

  • param[[1]]: Mixing proportions (\(\pi_k\)).

  • param[[2]]: Cluster means (\(\mu\)).

  • param[[3]]: Covariance matrices (\(\Sigma\)).

max_iter

An integer specifying the maximum number of iterations allowed for the EM algorithm. (default is 100)

tol

A numeric value specifying the convergence tolerance threshold (default is 1e-5). The algorithm stops when the relative change in log-likelihood is below this value.

Details

The EM algorithm iteratively refines the estimates of the Gaussian Mixture Model (GMM) parameters by alternating between two steps:

  • E-step: Computes the posterior probabilities (responsibilities) of cluster membership for each observation.

  • M-step: Updates the parameters (means, covariances, and mixing proportions) based on the computed responsibilities.

Convergence is assessed using the log-likelihood function.

See Also

getInit

Examples

Run this code
# Generate synthetic data
set.seed(123)
data <- matrix(rnorm(100 * 2), ncol = 2)

# Initialize parameters using k-means
init_params <- getInit(data, k = 2, method = "kmeans")

# Run the EM algorithm
em_results <- runEM(data, param = init_params, max_iter = 100, tol = 1e-5)

# Print results
print(em_results$BIC)  
print(em_results$cluster)

Run the code above in your browser using DataLab