Learn R Programming

GMMinit (version 1.0.0)

getBestInit: Select the Best Initialization Method for a Gaussian Mixture Model (GMM)

Description

Runs multiple GMM initialization strategies, computes the log-likelihood for each initial parameter set, and selects the best initialization method based on the highest log-likelihood value.

Usage

getBestInit(
  x, 
  k,
  init_methods = c("Random", "emEM", "emAEM",
                   "hierarchical average", "hierarchical ward",
                   "kmeans", "mclust", "cem", "sem"),
  run_number = 10,
  max_iter = 3,
  tol = 1e-6,
  burn_in = 3,verbose = FALSE
)

Value

A list summarizing the best initialization method and associated results:

  • best_method The name of the initialization method that achieved the highest log-likelihood.

  • best_result The initial parameter set (as returned by getInit()) corresponding to the best method.

  • loglik_table A data frame containing the log-likelihood achieved by each initialization method:

    • method — Method name

    • loglik — Computed log-likelihood

Arguments

x

A numeric matrix or data frame containing the data to be clustered. Each row is an observation, and each column is a feature.

k

The number of mixture components (clusters).

init_methods

A character vector of initialization strategies. Each method is passed to getInit() to generate initial GMM parameters. The default includes:

  • "Random"

  • "emEM"

  • "emAEM"

  • "hierarchical average"

  • "hierarchical ward"

  • "kmeans"

  • "mclust"

  • "cem"

  • "sem"

run_number

Number of short EM runs for EM-based initializers. Passed to getInit().

max_iter

Maximum number of iterations for short EM algorithms.

tol

Convergence tolerance for short EM runs.

burn_in

Burn-in iterations for SEM-based initialization.

verbose

Logical; if TRUE, prints progress messages.

See Also

getInit, runGMM, runEM

Examples

Run this code
# Simulated data
set.seed(123)
x <- matrix(rnorm(200), ncol = 2)

# Run selection of best initialization
 result <- getBestInit(x, k = 2)

 result$best_method
 result$loglik_table

Run the code above in your browser using DataLab