Learn R Programming

MGMM (version 1.0.1.3)

FitGMM: Estimate Multivariate Normal Mixture

Description

Given an \(n \times d\) matrix of random vectors, estimates the parameters of a Gaussian Mixture Model (GMM). Accommodates arbitrary patterns of missingness at random (MAR) in the input vectors.

Usage

FitGMM(
  data,
  k = 1,
  init_means = NULL,
  fix_means = FALSE,
  init_covs = NULL,
  lambda = 0,
  init_props = NULL,
  maxit = 100,
  eps = 1e-06,
  report = TRUE
)

Value

  • For a single component, an object of class mvn, containing the estimated mean and covariance, the final objective function, and the imputed data.

  • For a multicomponent model \(k>1\), an object of class mix, containing the estimated means, covariances, cluster proportions, cluster responsibilities, and observation assignments.

Arguments

data

Numeric data matrix.

k

Number of mixture components. Defaults to 1.

init_means

Optional list of initial mean vectors.

fix_means

Fix the means to their starting value? Must provide initial values.

init_covs

Optional list of initial covariance matrices.

lambda

Optional ridge term added to each component covariance matrix to ensure positive definiteness.

init_props

Optional vector of initial cluster proportions.

maxit

Maximum number of EM iterations.

eps

Minimum acceptable increment in the EM objective.

report

Report fitting progress?

Details

Initial values for the cluster means, covariances, and proportions are specified using init_means, init_covs, and init_props, respectively. If the data contain complete observations (rows with no missing elements), FitGMM will attempt to initialize these parameters internally using K-means. If there are no complete observations, initial values are required for init_means, init_covs, and init_props.

See Also

See rGMM for data generation, and ChooseK for selecting the number of clusters.

Examples

Run this code
# \donttest{
# Single component without missingness
# Bivariate normal observations
sigma <- matrix(c(1, 0.5, 0.5, 1), nrow = 2)
data <- rGMM(n = 1e3, d = 2, k = 1, means = c(2, 2), covs = sigma)
fit <- FitGMM(data, k = 1)

# Single component with missingness
# Trivariate normal observations
mean_list <- list(c(-2, -2, -2), c(2, 2, 2))
sigma <- matrix(c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3)
data <- rGMM(n = 1e3, d = 3, k = 2, means = mean_list, covs = sigma)
fit <- FitGMM(data, k = 2)

# Two components without missingness
# Trivariate normal observations
mean_list <- list(c(-2, -2, -2), c(2, 2, 2))
sigma <- matrix(c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3)
data <- rGMM(n = 1e3, d = 3, k = 2, means = mean_list, covs = sigma)
fit <- FitGMM(data, k = 2)

# Four components with missingness
# Bivariate normal observations
# Note: Fitting is slow.
mean_list <- list(c(2, 2), c(2, -2), c(-2, 2), c(-2, -2))
sigma <- 0.5 * diag(2)
data <- rGMM(
n = 1000, 
d = 2, 
k = 4, 
pi = c(0.35, 0.15, 0.15, 0.35), 
miss = 0.1, 
means = mean_list, 
covs = sigma)
fit <- FitGMM(data, k = 4)
# }

Run the code above in your browser using DataLab