Fits MoEClust models: Gaussian Mixture of Experts models with GPCM/mclust-family covariance structures. In other words, performs model-based clustering via the EM/CEM algorithm where covariates are allowed to enter neither, either, or both the mixing proportions (gating network) and/or component densities (expert network) of a Gaussian Parsimonious Clustering Model, with or without an additional noise component. Additional arguments are available via the function MoE_control
, including the specification of a noise component, controls on the initialisation of the algorithm, and more.
MoE_clust(data,
G = 1:9,
modelNames = NULL,
gating = ~1,
expert = ~1,
control = MoE_control(...),
network.data = NULL,
...)# S3 method for MoEClust
print(x,
digits = 3L,
...)
# S3 method for MoEClust
summary(object,
classification = TRUE,
parameters = FALSE,
networks = FALSE,
...)
A numeric vector, matrix, or data frame of observations. Categorical variables are not allowed. If a matrix or data frame, rows correspond to observations and columns correspond to variables.
An integer vector specifying the numbers of mixture components (clusters) to fit. Defaults to G=1:9
. Must be a strictly positive integer, unless a noise component is included in the estimation, in which case G=0
is allowed and included by default. (see MoE_control
).
A vector of character strings indicating the models to be fitted in the EM/CEM phase of clustering. With n
observations and d
variables, the defaults are:
for univariate data | c("E", "V") |
for multivariate data \(n > d\) | mclust.options("emModelNames") |
For single-component models these options reduce to:
for univariate data | "E" |
for multivariate data \(n > d\) | c("EII", "EEI", "EEE") |
For zero-component models with a noise component only the "E"
and "EII"
models will be fitted for univariate and multivariate data, respectively, although this is clearly for naming consistency only. The help file for mclustModelNames
further describes the available models (though the "X"
in the single-component models will be coerced to "E"
if supplied that way). For single-component models, other model names equivalent to those above can be supplied, but will be coerced to those above.
A formula
for determining the model matrix for the multinomial logistic regression in the gating network when fixed covariates enter the mixing proportions. Defaults to ~1
, i.e. no covariates. This will be ignored where G=1
. Continuous, categorical, and/or ordinal covariates are allowed. Logical covariates will be coerced to factors. Interactions, transformations, and higher order terms are permitted: the latter must be specified explicitly using the AsIs
operator (I
). The specification of the LHS of the formula is ignored. Intercept terms are included by default.
A formula
for determining the model matrix for the (multivariate) WLS in the expert network when fixed covariates are included in the component densities. Defaults to ~1
, i.e. no covariates. Continuous, categorical, and/or ordinal covariates are allowed. Logical covariates will be coerced to factors. Interactions, transformations, and higher order terms are permitted: the latter must be specified explicitly using the AsIs
operator (I
). The specification of the LHS of the formula is ignored. Intercept terms are included by default.
A list of control parameters for the EM/CEM and other aspects of the algorithm. The defaults are set by a call to MoE_control
. In particular, arguments pertaining to the inclusion of an additional noise component are documented here.
An optional data frame (or a matrix with named columns) in which to look for the covariates in the gating
&/or expert
network formulas, if any. If not found in network.data
, any supplied gating
&/or expert
covariates are taken from the environment from which MoE_clust
is called. Try to ensure the names of variables in network.data
do not match any of those in data
.
An alternative means of passing control parameters directly via the named arguments of MoE_control
. Do not pass the output from a call to MoE_control
here! This argument is only relevant for the MoE_clust
function and will be ignored for the associated print
and summary
functions.
Arguments required for the print
and summary
functions: x
and object
are objects of class "MoEClust"
resulting from a call to MoE_clust
, while digits
gives the number of decimal places to round to for printing purposes (defaults to 3). classification
, parameters
, and networks
are logicals which govern whether a table of the MAP classification of observations, the mixture component parameters, and the gating/expert network coefficients are printed, respectively.
A list (of class "MoEClust"
) with the following named entries, mostly corresponding to the chosen optimal model (as determined by the criterion
within MoE_control
):
call
The matched call.
data
The input data, as a data.frame
.
modelName
A character string denoting the GPCM/mclust model type at which the optimal criterion
occurs.
n
The number of observations in the data
.
d
The dimension of the data
.
G
The optimal number of mixture components according to criterion
.
BIC
A matrix of all BIC values with length{G}
rows and length(modelNames)
columns. May include missing entries: NA
represents models which were not visited, -Inf
represents models which were terminated due to error, for which a log-likelihood could not be estimated. Inherits the classes "MoECriterion"
and "mclustBIC"
, for which dedicated printing and plotting functions exist, respectively.
ICL
A matrix of all ICL values with length{G}
rows and length(modelNames)
columns. May include missing entries: NA
represents models which were not visited, -Inf
represents models which were terminated due to error, for which a log-likelihood could not be estimated. Inherits the classes "MoECriterion"
and "mclustICL"
, for which dedicated printing and plotting functions exist, respectively.
AIC
A matrix of all AIC values with length{G}
rows and length(modelNames)
columns. May include missing entries: NA
represents models which were not visited, -Inf
represents models which were terminated due to error, for which a log-likelihood could not be estimated. Inherits the classes "MoECriterion"
and "mclustAIC"
, for which dedicated printing and plotting functions exist, respectively.
bic
The BIC value corresponding to the optimal model. May not necessarily be the optimal BIC.
icl
The ICL value corresponding to the optimal model. May not necessarily be the optimal ICL.
aic
The AIC value corresponding to the optimal model. May not necessarily be the optimal AIC.
gating
An object of class "MoE_gating"
(for which dedicated print
, summary
, and predict
methods exist) and either "multinom"
or "glm"
(only for single-component models or noise-only models) giving the multinom
regression coefficients of the gating
network. If gating
covariates were NOT supplied (or the best model has just one component), this corresponds to a RHS of ~1
, otherwise the supplied gating
formula. As such, a fitted gating
network is always returned even in the absence of supplied covariates or clusters. The number of parameters to penalise by for MoE_crit
is given by length(coef(gating))
, and the gating
formula used is stored here as an attribute. If there is a noise component (and the option noise.gate=TRUE
is invoked), its coefficients are those for the last component. Users are cautioned against making inferences about statistical significance from summaries of the coefficients in the gating network.
expert
An object of class "MoE_expert"
(for which dedicated print
, summary
, and predict
methods exist) and "lm"
giving the (multivariate) WLS regression coefficients of the expert
network. If expert
covariates were NOT supplied, this corresponds to a RHS of ~1
, otherwise the supplied expert
formula. As such, a fitted expert
network is always returned even in the absence of supplied covariates. The number of parameters to penalise by for MoE_crit
is given by G * length(coef(expert[[1]]))
, and the expert
formula used is stored here is an attribute. Users are cautioned against making inferences about statistical significance from summaries of the coefficients in the expert network.
LOGLIK
A matrix of all maximal log-likelihood values with length{G}
rows and length(modelNames)
columns. May include missing entries: NA
represents models which were not visited, -Inf
represents models which were terminated due to error, for which a log-likelihood could not be estimated. Inherits the classes "MoECriterion"
and "mclustLoglik"
, for which dedicated printing and plotting functions exist, respectively.
loglik
The vector of increasing log-likelihood values for every EM/CEM iteration under the optimal model. The last element of this vector is the maximum log-likelihood achieved by the parameters returned at convergence.
linf
An asymptotic estimate of the final converged maximised log-likelihood. Returned when stopping="aitken"
and G > 1
(see MoE_control
and aitken
), otherwise the last element of loglik
is returned instead.
df
The number of estimated parameters in the optimal model (i.e. the number of 'used' degrees of freedom). Subtract this number from n
to get the degrees of freedom. The number of parameters due to the gating network, expert network, and covariance matrices are also stored here as attributes of df
.
iters
The total number of EM/CEM iterations for the optimal model.
hypvol
The hypervolume parameter for the noise component if required, otherwise set to NA
(see MoE_control
).
parameters
A list with the following named components:
pro
The mixing proportions: either a vector of length G
or, if gating
covariates were supplied, a matrix with an entry for each observation (rows) and component (columns).
mean
The means of each component. If there is more than one component, this is a matrix whose k-th column is the mean of the k-th component of the mixture model.
For models with expert network covariates, this is given by the posterior mean of the fitted values, otherwise the posterior mean of the response is reported. For models with expert network covariates, the observation-specific component means can be accessed by calling predict
on the expert
object above.
variance
A list of variance parameters of each component of the model. The components of this list depend on the model type specification. See the help file for mclustVariance
for details. Also see expert_covar
for an alternative approach to summarising the variance parameters in the presence of expert network covariates.
Vinv
The inverse of the hypervolume parameter for the noise component if required, otherwise set to NULL
(see MoE_control
).
z
The final responsibility matrix whose [i,k]
-th entry is the probability that observation i belonds to the k-th component. If there is a noise component, its values are found in the last column.
classification
The vector of cluster labels for the chosen model corresponding to z
, i.e. max.col(z)
. Observations belonging to the noise component, if any, will belong to component 0
.
uncertainty
The uncertainty associated with the classification
.
net.covs
A data frame gathering the unique set of covariates used in the gating
and expert
networks, if any. Will contain zero columns in the absence of gating or expert network covariates. Supplied gating covariates will be excluded if the optimal model has only one component. May have fewer columns than covariates supplied via the network.data
argument also, as only the included covariates are gathered here.
resid.data
In the presence of expert network covariates, this is the augmented data actually used in the clustering at convergence, as a list of G
matrices of WLS residuals of dimension n * d
. Will contain zero columns in the absence of expert network covariates.
DF
A matrix giving the numbers of estimated parameters (i.e. the number of 'used' degrees of freedom) for all visited models, with length{G}
rows and length(modelNames)
columns. Subtract these numbers from n
to get the degrees of freedom. May include missing entries: NA
represents models which were not visited, -Inf
represents models which were terminated due to error, for which parameters could not be estimated. Inherits the classes "MoECriterion"
and "mclustDF"
, for which dedicated printing and plotting functions exist, respectively.
ITERS
A matrix giving the total number of EM/CEM iterations for all visited models, with length{G}
rows and length(modelNames)
columns. May include missing entries: NA
represents models which were not visited, Inf
represents models which were terminated due to singularity/error and thus would never have converged. Inherits the classes "MoECriterion"
and "mclustITERS"
, for which dedicated printing and plotting functions exist, respectively.
The function effectively allows 6 different types of Gaussian Mixture of Experts model (as well as the different models in the GPCM/mclust family, for each): i) the standard finite Gaussian mixture with no covariates, ii) fixed covariates only in the gating network, iii) fixed covariates only in the expert network, iv) the full Mixture of Experts model with fixed covariates entering both the mixing proportions and component densities. By constraining the mixing proportions to be equal (see equalPro
in MoE_control
) two extra special cases are facilitated when gating covariates are excluded.
Note that having the same covariates in both networks is allowed. So too are interactions, transformations, and higher order terms (see formula
): the latter must be specified explicitly using the AsIs
operator (I
). Covariates can be continuous, categorical, logical, or ordinal, but the response must always be continuous.
While model selection in terms of choosing the optimal number of components and the GPCM/mclust model type is performed within MoE_clust
, using one of the criterion
options within MoE_control
, choosing between multiple fits with different combinations of covariates or different initialisation settings can be done by supplying objects of class "MoEClust"
to MoE_compare
.
Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <10.1007/s11634-019-00373-8>.
Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458): 611-631.
See MoE_stepwise
for identifying the optimal model and its covariates via greedy forward stepwise selection.
MoE_control
, MoE_compare
, plot.MoEClust
, predict.MoEClust
, predict.MoE_gating
, predict.MoE_expert
, as.Mclust
, MoE_crit
, MoE_estep
, MoE_cstep
, MoE_dens
, mclustModelNames
, mclustVariance
, expert_covar
, aitken
, I
# NOT RUN {
data(ais)
hema <- ais[,3:7]
sex <- ais$sex
BMI <- ais$BMI
# Fit a standard finite mixture model
m1 <- MoE_clust(hema, G=2:3)
# Allow covariates to enter the mixing proportions
m2 <- MoE_clust(hema, G=2:3, gating= ~ sex + BMI)
# Allow covariates to enter the component densities
m3 <- MoE_clust(hema, G=2:3, expert= ~ sex)
# Allow covariates to enter both the gating & expert network
m4 <- MoE_clust(hema, G=2:3, gating= ~ BMI, expert= ~ sex)
# Fit an equal mixing proportion model with an expert network covariate
m5 <- MoE_clust(hema, G=2:3, expert= ~ sex + BMI, equalPro=TRUE)
# Fit models with gating covariates & an additional noise component
m6 <- MoE_clust(hema, G=2:3, tau0=0.1, gating= ~ BMI, network.data=ais)
# Extract the model with highest BIC
(comp <- MoE_compare(m1, m2, m3, m4, m5, m6, criterion="bic"))
# See if a better model can be found using greedy forward stepwise selection
(step <- MoE_stepwise(ais[,3:7], ais))
(comp <- MoE_compare(comp, step, optimal.only=TRUE))
(best <- comp$optimal)
(summ <- summary(best, classification=TRUE, parameters=TRUE, networks=TRUE))
# Examine the expert network in greater detail
# (but refrain from inferring statistical significance!)
summary(best$expert)
# Visualise the results, incl. the gating network and log-likelihood
plot(best, what="gpairs")
plot(best, what="gating") # equal mixing proportions!
plot(best, what="loglik")
# Visualise the results using the 'lattice' library
require("lattice")
z <- factor(best$classification, labels=paste0("Cluster", seq_len(best$G)))
splom(~ hema | sex, groups=z)
splom(~ hema | z, groups=sex)
# }
Run the code above in your browser using DataLab