vdp.mixt(dat, prior.alpha = 1, prior.alphaKsi = 0.01,
prior.betaKsi = 0.01, do.sort = TRUE, threshold = 1e-05,
initial.K = 1, ite = Inf, implicit.noise = 0, c.max = 10,
speedup = TRUE, min.size = 5)
ALGORITHM SUMMARY This code implements Gaussian mixture models with diagonal covariance matrices. The following greedy iterative approach is taken in order to obtain the number of mixture models and their corresponding parameters:
1. Start from one cluster, $T = 1$. 2. Select a number of candidate clusters according to their values of "Nc" = \sum_{n=1}^N q_{z_n} (z_n = c) (larger is better). 3. For each of the candidate clusters, c: 3a. Split c into two clusters, c1 and c2, through the bisector of its principal component. Initialise the responsibilities q_{z_n}(z_n = c_1) and q_{z_n}(z_n = c_2). 3b. Update only the parameters of c1 and c2 using the observations that belonged to c, and determine the new value for the free energy, F{T+1}. 3c. Reassign cluster labels so that cluster 1 corresponds to the largest cluster, cluster 2 to the second largest, and so on. 4. Select the split that lead to the maximal reduction of free energy, F{T+1}. 5. Update the posterior using the newly split data. 6. If FT - F{T+1} < \epsilon then halt, else set T := T +1 and go to step 2.
The loop is implemented in the function greedy(...)
set.seed(123)
# Generate toy data with two Gaussian components
dat <- rbind(array(rnorm(400), dim = c(200,2)) + 5,
array(rnorm(400), dim = c(200,2)))
# Infinite Gaussian mixture model with
# Variational Dirichlet Process approximation
mixt <- vdp.mixt( dat )
# Centroids of the detected Gaussian components
mixt$posterior$centroids
# Hard mixture component assignments for the samples
apply(mixt$posterior$qOFz, 1, which.max)
Run the code above in your browser using DataLab