FitGMM: Estimate Multivariate Normal Mixture

Description

Given an \(n \times d\) matrix of random vectors, estimates the parameters of a Gaussian Mixture Model (GMM). Accommodates arbitrary patterns of missingness at random (MAR) in the input vectors.

Usage

FitGMM(
  data,
  k = 1,
  init_means = NULL,
  fix_means = FALSE,
  init_covs = NULL,
  lambda = 0,
  init_props = NULL,
  maxit = 100,
  eps = 1e-06,
  report = TRUE
)

Value

For a single component, an object of class mvn, containing the estimated mean and covariance, the final objective function, and the imputed data.
For a multicomponent model \(k>1\), an object of class mix, containing the estimated means, covariances, cluster proportions, cluster responsibilities, and observation assignments.

Arguments

data: Numeric data matrix.
k: Number of mixture components. Defaults to 1.
init_means: Optional list of initial mean vectors.
fix_means: Fix the means to their starting value? Must provide initial values.
init_covs: Optional list of initial covariance matrices.
lambda: Optional ridge term added to each component covariance matrix to ensure positive definiteness.
init_props: Optional vector of initial cluster proportions.
maxit: Maximum number of EM iterations.
eps: Minimum acceptable increment in the EM objective.
report: Report fitting progress?

Details

Initial values for the cluster means, covariances, and proportions are specified using init_means, init_covs, and init_props, respectively. If the data contain complete observations (rows with no missing elements), FitGMM will attempt to initialize these parameters internally using K-means. If there are no complete observations, initial values are required for init_means, init_covs, and init_props.

Examples

Run this code

# \donttest{
# Single component without missingness
# Bivariate normal observations
sigma <- matrix(c(1, 0.5, 0.5, 1), nrow = 2)
data <- rGMM(n = 1e3, d = 2, k = 1, means = c(2, 2), covs = sigma)
fit <- FitGMM(data, k = 1)

# Single component with missingness
# Trivariate normal observations
mean_list <- list(c(-2, -2, -2), c(2, 2, 2))
sigma <- matrix(c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3)
data <- rGMM(n = 1e3, d = 3, k = 2, means = mean_list, covs = sigma)
fit <- FitGMM(data, k = 2)

# Two components without missingness
# Trivariate normal observations
mean_list <- list(c(-2, -2, -2), c(2, 2, 2))
sigma <- matrix(c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3)
data <- rGMM(n = 1e3, d = 3, k = 2, means = mean_list, covs = sigma)
fit <- FitGMM(data, k = 2)

# Four components with missingness
# Bivariate normal observations
# Note: Fitting is slow.
mean_list <- list(c(2, 2), c(2, -2), c(-2, 2), c(-2, -2))
sigma <- 0.5 * diag(2)
data <- rGMM(
n = 1000, 
d = 2, 
k = 4, 
pi = c(0.35, 0.15, 0.15, 0.35), 
miss = 0.1, 
means = mean_list, 
covs = sigma)
fit <- FitGMM(data, k = 4)
# }