Learn R Programming

Missingness-Aware Gaussian Mixture Models

Zachary McCaw Updated: 2026-02-26

This package performs estimation and inference for Gaussian Mixture Models (GMMs) where the input data may contain missing values. Rather than imputing missing values before fitting the GMM, this package uses an extended EM algorithm to obtain the true maximum likelihood estimates of all model parameters given the observed data. In particular MGMM performs the following tasks:

  • Maximum likelihood estimation of cluster means, covariances, and proportions.
  • Calculation of cluster membership probabilities and maximum a posteriori classification of the input vectors.
  • Deterministic completion of the input data, by imputing missing elements to their posterior means, and stochastic completion of the input data, by drawing missing elements from the fitted GMM.

The method is detailed in Fitting Gaussian mixture models on incomplete data.

Main Functions

  • FitGMM estimates model parameters, performs classification and imputation.
  • rGMM simulates observations from a GMM, potentially with missingness.
  • ChooseK provides guidance on choosing the number of clusters.
  • GenImputation performs stochastic imputation for multiple imputation-based inference.

Compact Example

set.seed(101)
library(MGMM)

# Parameter settings.
mean_list <- list(
  c(1, 1),
  c(-1, -1)
)
cov_list <- list(
  matrix(c(1, -0.5, -0.5, 1), nrow = 2),
  matrix(c(1, 0.5, 0.5, 1), nrow = 2)
)

# Generate data.
data <- rGMM(
  n = 1e3, 
  d = 2, 
  k = 2, 
  miss = 0.1, 
  means = mean_list, 
  covs = cov_list
)

# Original data.
head(data)
##           y1          y2
## 1  1.6512855  2.60621938
## 2 -0.5721069          NA
## 2 -2.0045376 -2.31888263
## 2 -0.6229388 -1.51543968
## 1  2.0258413  0.06921658
## 2 -1.3476380 -1.51915826
# Choose cluster number.
choose_k <- ChooseK(
  data,
  k0 = 2,
  k1 = 4,
  boot = 10,
  maxit = 10,
  eps = 1e-4,
  report = TRUE
)
## Cluster size 2 complete. 11 fit(s) succeeded.
## Cluster size 3 complete. 11 fit(s) succeeded.
## Cluster size 4 complete. 11 fit(s) succeeded.
# Cluster number recommendations. 
show(choose_k$Choices)
##   Metric k_opt   Metric_opt k_1se   Metric_1se
## 1    BIC     4 2285.4269195     2 2340.4441587
## 2    CHI     4    4.5081110     4    4.5081110
## 3    DBI     2    0.7818011     2    0.7818011
## 4    SIL     2    0.4785951     2    0.4785951
# Estimation.
fit <- FitGMM(
  data,
  k = 2,
  maxit = 10
)
## Objective increment:  11.1 
## Objective increment:  2.39 
## Objective increment:  1.57 
## Objective increment:  1.08 
## Objective increment:  0.91 
## Objective increment:  0.745 
## Objective increment:  0.617 
## Objective increment:  0.507 
## Objective increment:  0.416 
## Objective increment:  0.34 
## 10 update(s) performed without reaching tolerance limit.
# Estimated means. 
show(fit@Means)
## [[1]]
##        y1        y2 
## -1.056071 -1.070412 
## 
## [[2]]
##        y1        y2 
## 0.9437473 0.9513797
# Estimated covariances. 
show(fit@Covariances)
## [[1]]
##           y1        y2
## y1 0.9447484 0.5201638
## y2 0.5201638 0.9611714
## 
## [[2]]
##            y1         y2
## y1  0.9973684 -0.4489898
## y2 -0.4489898  0.9728258
# Cluster assignments. 
head(fit@Assignments)
##   Assignments      Entropy
## 1           2 7.957793e-02
## 2           1 8.426200e-01
## 2           1 8.629837e-07
## 2           1 7.790894e-03
## 1           2 8.451183e-02
## 2           1 4.521627e-04
# Deterministic imputation.
head(fit@Completed)
##           y1          y2
## 1  1.6512855  2.60621938
## 2 -0.5721069 -0.14379539
## 2 -2.0045376 -2.31888263
## 2 -0.6229388 -1.51543968
## 1  2.0258413  0.06921658
## 2 -1.3476380 -1.51915826
# Stochastic imputation.
imp <- GenImputation(fit)
head(imp)
##           y1          y2
## 1  1.6512855  2.60621938
## 2 -0.5721069  0.88613853
## 2 -2.0045376 -2.31888263
## 2 -0.6229388 -1.51543968
## 1  2.0258413  0.06921658
## 2 -1.3476380 -1.51915826

Documentation

A detailed write-up with derivations and examples (LaTeX-compiled PDF) is here.

Copy Link

Version

Install

install.packages('MGMM')

Monthly Downloads

231

Version

1.0.1.3

License

GPL-3

Maintainer

Zachary McCaw

Last Published

February 26th, 2026

Functions in MGMM (1.0.1.3)

ReconstituteData

Reconstitute Data
logLik.mix

Log-Likelihood for Fitted GMM
MixUpdateMeans

Mean Update for Mixture of MVNs with Missingness.
print.mvn

Print Fitted MVN Model
rGMM

Generate Data from Gaussian Mixture Models
PartitionData

Partition Data by Missingness Pattern
vcov.mvn

Covariance for Fitted MVN Model
vcov.mix

Covariance for Fitted GMM
FitMVN

Fit Multivariate Normal Distribution
GenImputation

Generate Stochastic Imputation
FitGMM

Estimate Multivariate Normal Mixture
FitMix

Fit Multivariate Mixture Distribution
DavBou

Davies-Bouldin Index
mix-class

Mixture Model Class
mean.mvn

Mean for Fitted MVN Model
show,mix-method

Show for Fitted Mixture Models
mvn-class

Multivariate Normal Model Class
CalHar

Calinski-Harabasz Index
ChooseK

Cluster Number Selection
print.mix

Print Fitted GMM
mean.mix

Mean for Fitted GMM
logLik.mvn

Log-Likelihood for Fitted MVN Model
show,mvn-method

Show for Multivariate Normal Models
ClustQual

Cluster Quality
CombineMIs

Combine Multiple Imputations
MGMM-package

Missingness-Aware Gaussian Mixture Models