Michael, S., & Melnykov, V. (2016). An effective strategy for initializing the EM algorithm in finite mixture models.
Advances in Data Analysis and Classification, 10(4), 563–583.
Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation.
Journal of the American Statistical Association, 97(458), 611-631. [mclust]
Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm.
Journal of the Royal Statistical Society: Series C (Applied Statistics), 28(1), 100-108. [k-means]
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm.
Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1-22. [EM Algorithm]
Celeux, G., & Govaert, G. (1992). A classification EM algorithm for clustering and two stochastic versions.
Computational Statistics & Data Analysis, 14(3), 315-332. [CEM & SEM]
McLachlan, G., & Peel, D. (2000). Finite Mixture Models. John Wiley & Sons. [General EM & GMM]
Biernacki, C., Celeux, G., & Govaert, G. (2003). Choosing starting values for the EM algorithm in Gaussian mixture models.
Computational Statistics & Data Analysis, 41(3-4), 561-575. [Alternative EM Initializations]