getInit: Initialize Parameters for the EM Algorithm in Gaussian Mixture Models

Description

Selects an initialization method to generate starting parameters for the Expectation-Maximization (EM) algorithm in Gaussian Mixture Models (GMMs).

Usage

getInit(x, k, method, run_number = 10, max_iter = 3, tol = 1e-6, burn_in = 3)

Value

A list containing the initialized parameters for the EM algorithm, including:

mu: Cluster means.
sigma: Covariance matrices.
pi: Mixing proportions.

Arguments

x

A numeric matrix or data frame where rows represent observations and columns represent variables.

k

An integer specifying the number of clusters.

method

A character string specifying the initialization method to use. Options include:

"Random": Randomly generates cluster centers.
"emEM": Multi-start EM initialization and choose the one with maximum loglikelihood.
"emAEM": Alternative Expectation-Maximization initialization.
"hierarchical average": Uses hierarchical clustering (average linkage).
"hierarchical ward": Uses hierarchical clustering (Ward’s method).
"kmeans": Uses k-means clustering to initialize parameters.
"mclust": Uses model-based clustering from the mclust package.
"cem": Classification Expectation-Maximization (CEM).
"sem": Stochastic Expectation-Maximization (SEM).

run_number

An integer specifying the number of times the initialization process should be repeated for emEM and emAEM (default is 10).

max_iter

An integer specifying the maximum number of iterations for emEM and emAEM initialization (default is 3).

tol

A numeric value specifying the convergence tolerance for the EM algorithm (default is 1e-6).

burn_in

An integer specifying the number of burn-in iterations for stochastic methods (default is 3).

Details

This function selects an appropriate initialization method and returns the starting values for the EM algorithm. It helps to prevent local optima issues and improve convergence in Gaussian mixture models.

Examples

Run this code

# Generate sample data
set.seed(123)
data <- matrix(rnorm(100 * 2), ncol = 2)

# Compute optimal initialization using k-means
library(GMMinit)
init_params <- getInit(data, k = 2, method = "kmeans")
print(init_params)