This function selects the number of mixture components and estimates the parameters of a Gaussian mixture model using a split-and-merge EM (SMEM) algorithm. At the first iteration, the classic EM algorithm is performed to update the parameters of the initial model. Then each following iteration consists in splitting a component into two or merging two components, before re-estimating the parameters with the EM algorithm. The selected split or merge operation is the one that maximizes a scoring function (after the re-estimation process). To avoid testing all possible operations, the split and merge candidates are initially ranked according to relevant criteria (Zhang et al., 2003). At first, the top-ranked split and top-ranked merge operations are tested. If neither of them increases the score, the second-ranked ones are considered, and so on. The SMEM algorithm stops if a given maximum rank is reached without improving the score.
smem(
gmm,
data,
y = NULL,
score = "bic",
split = TRUE,
merge = TRUE,
min_comp = 1,
max_comp = Inf,
space = 0.5,
max_rank = 1,
max_iter_smem = 10,
verbose = FALSE,
...
)
A list with elements:
The final gmm
object.
A numeric matrix containing the posterior probabilities for each observation.
A numeric vector containing the sequence of scores measured initially and after each iteration.
A character vector containing the sequence of split and merge operations performed at each iteration.
An initial object of class gmm
.
A data frame or numeric matrix containing the data used in the
SMEM algorithm. Its columns must explicitly be named after the variables of
gmm
and must not contain missing values.
A character vector containing the dependent variables if a
conditional model is estimated (which involves maximizing a conditional
score). If NULL
(the default), the joint model is estimated.
A character string ("aic"
, "bic"
or
"loglik"
) corresponding to the scoring function.
A logical value indicating whether split operations are allowed
(if FALSE
, no mixture component can be split).
A logical value indicating whether merge operations are allowed
(if FALSE
, no mixture component can be merged).
A positive integer corresponding to the minimum number of mixture components.
A positive integer corresponding to the maximum number of mixture components.
A numeric value in [0, 1[ corresponding to the space between two subcomponents resulting from a split.
A positive integer corresponding to the maximum rank for testing the split and merge candidates.
A non-negative integer corresponding to the maximum number of iterations.
A logical value indicating whether iterations in progress are displayed.
Additional arguments passed to function em
.
Zhang, Z., Chen, C., Sun, J. and Chan, K. L. (2003). EM algorithms for Gaussian mixtures with split-and-merge operation. Pattern Recognition, 36(9):1973--1983.
em
, stepwise
data(data_body)
gmm_1 <- add_var(NULL, c("WAIST", "AGE", "FAT", "HEIGHT", "WEIGHT"))
res_smem <- smem(gmm_1, data_body, max_comp = 3, verbose = TRUE)
Run the code above in your browser using DataLab