Runs conditional mixture modeling and model-based clustering by EM algorithm (Expectation Maximization) for a prespecified variables conditioning order. Runs variable selection procedure (forward, backward or stepwise) to achieve a parsimonious mixture model.
cmb.em(x, order = NULL, l, K, method = "stepwise", id0 = NULL, n.em = 200, em.iter = 5,
EM.iter = 200, nk.min = NULL, max.spur=5, tol = 1e-06, silent = FALSE, Parallel = FALSE,
n.cores = 4)
input dataset
estimated regression models for each cluster (K x p matrix)
vector of estimated membership (length n)
estimated log likelihood
Bayesian Information Criterion
vector of estimated mixing proportions (length K)
matrix of estimated posterior probabilities (n x K)
matrix of estimated regression parameters (K x (p + p(p-1)l/2) )
matrix of estimated variance (K x p)
applied conditioning order (length p)
number of model parameters
dataset matrix (n x p)
customized variables' conditioning order (length p)
order of polynomial regression model
number of clusters
variable selection method (options 'stepwise', 'forward', 'backward' and 'none')
initial membership vector (length n)
number of short EM in an emEM procedure
maximum number of iterations of short EM in an emEM procedure
maximum number of EM iterations
spurious output control
number of trials
tolerance level
output control (TRUE/FALSE)
parallel computing (TRUE/FALSE)
number of cores in parallel computing
In conditional mixture modeling, each component is modeled by a product of conditional distributions with the means expressed by polynomial regression functions depending on other variables. Polynomial regression function order l and the number of clusters K are prespecified by user. The model's initialization can be determined by passing a group membership vector to the argument id, or obtained by the emEM algorithm (the default setting) in the function. There are two arguments related to the emEM procedure, the number of short EM n.em and maximum number of iterations for short EM em.iter. By default, the n.em = 200 and em.iter = 5. The method of variable selection can be specified as method = "stepwise", "forward", "backward", or "none" where method = none means no parsimonious procedure conducted. During the model fitting and variable selection phases, EM algorithm will be applied multiple times, where options EM.iter and tol are stopping criteria of EM iteration. The spurious output control argument nk.min, by default nk.min = (l x (p - 1) + 1) x 2, can be set by user. When spurious output is obtained, cmb.em will be rerun. The maximum number of rerunning is max.spur.
Notation: n - sample size, l - order of polynomial regression model, K - number of mixture components.
Biernacki C., Celeux G., Govaert G. (2003). Choosing Starting Values for the EM Algorithm for Getting the Highest Likelihood in Multivariate Gaussian Mixture Models. Computational Statistics and Data Analysis, 41(3-4), pp. 561-575.
set.seed(1)
K <- 3
l <- 2
x <- as.matrix(iris[,-5])
id.true <- iris[,5]
# \donttest{
# Run EM algorithm for fitting a conditioning mixture model
obj <- cmb.em(x = x, order = c(1,3,2,4), l, K, method = "stepwise", silent = FALSE,
Parallel = FALSE)
id.cmb <- obj$id
table(id.true, id.cmb)
obj$BIC
# }
Run the code above in your browser using DataLab