CNmixt(X, G, model = NULL, initialization = "mixt",
alphafix = NULL, alphamin = 0.5, etafix = NULL, etamax = 1000,
seed = NULL, start.z = NULL, start.v = NULL, start = 0,
ind.label = NULL, label = NULL, iter.max = 1000, threshold = 1.0e-03,
parallel = FALSE, eps = 1e-100)"EII", "VII", "EEI", "VEI", "EVI", "VVI", "EEE", "VEE", "EVE", "mixt"(default): the initial ($n \times G$) soft classification matrix (of posterior probabilities of groups membership) arises from a preliminary run of mixtures of mullength(alphafix) != G, then the first element is replicated $G$ times.
Default value is NULL.length(alphamin) != G, then the first element is replicated $G$ times.
Default value is 0.5.length(etafix) != G, then the first element is replicated $G$ times.
Default value is NULLetafix is NULL.
If length(etamax) != G, then the first element is repliNULL, current seed is not changed.
Default value is NULL.NULL.initialization = "mixt", initialization used for the gpcm() function of the mixture:gpcm for details).ind.label, with the group of membership of the observations indicated by ind.label.1000.1.0e-03.TRUE, the package parallel is used for parallel computation.
When several models are estimated, computational time is reduced.
The number of cores to use 1e-100.ContaminatedMixt is a list with components:callmodelname: the name of the best model.npar: number of free parameters.X: matrix of data.G: number of mixture components.p: number of variables.prior: weights for the mixture components.priorgood: weights for the good observations in each of thekgroups.mu: component means.Sigma: component covariance matrices for the good observations.eta: component contamination parameters.iter.stop: final iteration of the ECM algorithm.z: matrix with posterior probabilities for the outer groups.v: matrix with posterior probabilities for the inner groups.ind.label: vector of positions (rows) of the labeled observations.label: vector, of the same dimension asind.label, with the group of membership of the observations indicated byind.label.group: vector of integers indicating the maximum a posteriori classifications for the best model.loglik: log-likelihood value of the best model.BIC: BIC valueICL:ICL valuecall: an object of classcallfor the best model.X are either clustered or classified using parsimonious mixtures of multivariate contaminated normal distributions with some or all of the 14 parsimonious models described in Punzo and McNicholas (2015).
Model specification (via the model argument) follows the nomenclature popularized in other packages such as "V"ariable, "E"qual, or the "I"dentity matrix.
As an example, the string "VEI" would refer to the model where $\Sigma_g = \lambda_g \Delta$.
Note that for $G=1$, several models are equivalent (for example, "EEE" and "VVV").
Thus, for $G=1$ only one model from each set of equivalent models will be run.
The algorithms detailed in Celeux and Govaert (1995) are considered in the first CM-step of the ECM algorithm to update $\Sigma_g$ for all the models apart from "EVE" and "VVE".
For "EVE" and "VVE", majorization-minimization (MM) algorithms (Hunter and Lange, 2000) and accelerated line search algorithms on the Stiefel manifold (Absil, Mahony and Sepulchre, 2009 and Browne and McNicholas, 2014), which are especially preferable in higher dimensions (Browne and McNicholas, 2014), are used to update $\Sigma_g$; the same approach is also adopted in the ContaminatedMixt-package## Note that the example is extremely simplified
## in order to reduce computation time
# Artificial data from an EEI Gaussian mixture with G = 2 components
library("mnormt")
p <- 2
set.seed(12345)
X1 <- rmnorm(n = 200, mean = rep(2, p), varcov = diag(c(5, 0.5)))
X2 <- rmnorm(n = 200, mean = rep(-2, p), varcov = diag(c(5, 0.5)))
noise <- matrix(runif(n = 40, min = -20, max = 20), nrow = 20, ncol = 2)
X <- rbind(X1, X2, noise)
group <- rep(c(1, 2, 3), times = c(200, 200, 20))
plot(X, col = group, pch = c(3, 4, 16)[group], asp = 1, xlab = expression(X[1]),
ylab = expression(X[2]))
# ---------------------- #
# Model-based clustering #
# ---------------------- #
res1 <- CNmixt(X, model = c("EEI", "VVV"), G = 2, parallel = FALSE)
summary(res1)
agree(res1, givgroup = group)
plot(res1, contours = TRUE, asp = 1, xlab = expression(X[1]), ylab = expression(X[2]))
# -------------------------- #
# Model-based classification #
# -------------------------- #
indlab <- sample(1:400, 20)
lab <- group[indlab]
res2 <- CNmixt(X, G = 2, model = "EEI", ind.label = indlab, label = lab)
agree(res2, givgroup = group)Run the code above in your browser using DataLab