MoE_clust: MoEClust: Gaussian Parsimonious Clustering Models with Covariates and a Noise Component

Description

Fits MoEClust models: Gaussian Mixture of Experts models with GPCM/mclust-family covariance structures. In other words, performs model-based clustering via the EM/CEM algorithm where covariates are allowed to enter neither, either, or both the mixing proportions (gating network) and/or component densities (expert network) of a Gaussian Parsimonious Clustering Model, with or without an additional noise component. Additional arguments are available via the function MoE_control, including the specification of a noise component, controls on the initialisation of the algorithm, and more.

Usage

MoE_clust(data,
          G = 1:9,
          modelNames = NULL,
          gating = ~1,
          expert = ~1,
          control = MoE_control(...),
          network.data = NULL,
          ...)
# S3 method for MoEClust
print(x,
      digits = 3L,
      ...)
# S3 method for MoEClust
summary(object,
        classification = TRUE,
        parameters = FALSE,
        networks = FALSE,
        ...)

Arguments

data

A numeric vector, matrix, or data frame of observations. Categorical variables are not allowed. If a matrix or data frame, rows correspond to observations and columns correspond to variables.

An integer vector specifying the numbers of mixture components (clusters) to fit. Defaults to G=1:9. Must be a strictly positive integer, unless a noise component is included in the estimation, in which case G=0 is allowed and included by default. (see MoE_control).

modelNames

A vector of character strings indicating the models to be fitted in the EM/CEM phase of clustering. With n observations and d variables, the defaults are:

for univariate data	`c("E", "V")`
for multivariate data \(n > d\)	`mclust.options("emModelNames")`

For single-component models these options reduce to:

for univariate data	`"E"`
for multivariate data \(n > d\)	`c("EII", "EEI", "EEE")`

For zero-component models with a noise component only the "E" and "EII" models will be fitted for univariate and multivariate data, respectively, although this is clearly for naming consistency only. The help file for mclustModelNames further describes the available models (though the "X" in the single-component models will be coerced to "E" if supplied that way). For single-component models, other model names equivalent to those above can be supplied, but will be coerced to those above.

gating

A formula for determining the model matrix for the multinomial logistic regression in the gating network when fixed covariates enter the mixing proportions. Defaults to ~1, i.e. no covariates. This will be ignored where G=1. Continuous, categorical, and/or ordinal covariates are allowed. Logical covariates will be coerced to factors. Interactions, transformations, and higher order terms are permitted: the latter must be specified explicitly using the AsIs operator (I). The specification of the LHS of the formula is ignored. Intercept terms are included by default.

expert

A formula for determining the model matrix for the (multivariate) WLS in the expert network when fixed covariates are included in the component densities. Defaults to ~1, i.e. no covariates. Continuous, categorical, and/or ordinal covariates are allowed. Logical covariates will be coerced to factors. Interactions, transformations, and higher order terms are permitted: the latter must be specified explicitly using the AsIs operator (I). The specification of the LHS of the formula is ignored. Intercept terms are included by default.

control

A list of control parameters for the EM/CEM and other aspects of the algorithm. The defaults are set by a call to MoE_control. In particular, arguments pertaining to the inclusion of an additional noise component are documented here.

network.data

An optional data frame (or a matrix with named columns) in which to look for the covariates in the gating &/or expert network formulas, if any. If not found in network.data, any supplied gating &/or expert covariates are taken from the environment from which MoE_clust is called. Try to ensure the names of variables in network.data do not match any of those in data.

...

An alternative means of passing control parameters directly via the named arguments of MoE_control. Do not pass the output from a call to MoE_control here! This argument is only relevant for the MoE_clust function and will be ignored for the associated print and summary functions.

x, object, digits, classification, parameters, networks

Arguments required for the print and summary functions: x and object are objects of class "MoEClust" resulting from a call to MoE_clust, while digits gives the number of decimal places to round to for printing purposes (defaults to 3). classification, parameters, and networks are logicals which govern whether a table of the MAP classification of observations, the mixture component parameters, and the gating/expert network coefficients are printed, respectively.

Value

A list (of class "MoEClust") with the following named entries, mostly corresponding to the chosen optimal model (as determined by the criterion within MoE_control):

call

The matched call.

data

The input data, as a data.frame.

modelName

A character string denoting the GPCM/mclust model type at which the optimal criterion occurs.

n

The number of observations in the data.

d

The dimension of the data.

G

The optimal number of mixture components according to criterion.

BIC

A matrix of all BIC values with length{G} rows and length(modelNames) columns. May include missing entries: NA represents models which were not visited, -Inf represents models which were terminated due to error, for which a log-likelihood could not be estimated. Inherits the classes "MoECriterion" and "mclustBIC", for which dedicated printing and plotting functions exist, respectively.

ICL

A matrix of all ICL values with length{G} rows and length(modelNames) columns. May include missing entries: NA represents models which were not visited, -Inf represents models which were terminated due to error, for which a log-likelihood could not be estimated. Inherits the classes "MoECriterion" and "mclustICL", for which dedicated printing and plotting functions exist, respectively.

AIC

A matrix of all AIC values with length{G} rows and length(modelNames) columns. May include missing entries: NA represents models which were not visited, -Inf represents models which were terminated due to error, for which a log-likelihood could not be estimated. Inherits the classes "MoECriterion" and "mclustAIC", for which dedicated printing and plotting functions exist, respectively.

bic

The BIC value corresponding to the optimal model. May not necessarily be the optimal BIC.

icl

The ICL value corresponding to the optimal model. May not necessarily be the optimal ICL.

aic

The AIC value corresponding to the optimal model. May not necessarily be the optimal AIC.

gating

An object of class "MoE_gating" (for which dedicated print, summary, and predict methods exist) and either "multinom" or "glm" (only for single-component models or noise-only models) giving the multinom regression coefficients of the gating network. If gating covariates were NOT supplied (or the best model has just one component), this corresponds to a RHS of ~1, otherwise the supplied gating formula. As such, a fitted gating network is always returned even in the absence of supplied covariates or clusters. The number of parameters to penalise by for MoE_crit is given by length(coef(gating)), and the gating formula used is stored here as an attribute. If there is a noise component (and the option noise.gate=TRUE is invoked), its coefficients are those for the last component. Users are cautioned against making inferences about statistical significance from summaries of the coefficients in the gating network.

expert

An object of class "MoE_expert" (for which dedicated print, summary, and predict methods exist) and "lm" giving the (multivariate) WLS regression coefficients of the expert network. If expert covariates were NOT supplied, this corresponds to a RHS of ~1, otherwise the supplied expert formula. As such, a fitted expert network is always returned even in the absence of supplied covariates. The number of parameters to penalise by for MoE_crit is given by G * length(coef(expert[[1]])), and the expert formula used is stored here is an attribute. Users are cautioned against making inferences about statistical significance from summaries of the coefficients in the expert network.

LOGLIK

A matrix of all maximal log-likelihood values with length{G} rows and length(modelNames) columns. May include missing entries: NA represents models which were not visited, -Inf represents models which were terminated due to error, for which a log-likelihood could not be estimated. Inherits the classes "MoECriterion" and "mclustLoglik", for which dedicated printing and plotting functions exist, respectively.

loglik

The vector of increasing log-likelihood values for every EM/CEM iteration under the optimal model. The last element of this vector is the maximum log-likelihood achieved by the parameters returned at convergence.

linf

An asymptotic estimate of the final converged maximised log-likelihood. Returned when stopping="aitken" and G > 1 (see MoE_control and aitken), otherwise the last element of loglik is returned instead.

df

The number of estimated parameters in the optimal model (i.e. the number of 'used' degrees of freedom). Subtract this number from n to get the degrees of freedom. The number of parameters due to the gating network, expert network, and covariance matrices are also stored here as attributes of df.

iters

The total number of EM/CEM iterations for the optimal model.

hypvol

The hypervolume parameter for the noise component if required, otherwise set to NA (see MoE_control).

parameters

A list with the following named components:

pro

The mixing proportions: either a vector of length G or, if gating covariates were supplied, a matrix with an entry for each observation (rows) and component (columns).

mean

The means of each component. If there is more than one component, this is a matrix whose k-th column is the mean of the k-th component of the mixture model.

For models with expert network covariates, this is given by the posterior mean of the fitted values, otherwise the posterior mean of the response is reported. For models with expert network covariates, the observation-specific component means can be accessed by calling predict on the expert object above.

variance

A list of variance parameters of each component of the model. The components of this list depend on the model type specification. See the help file for mclustVariance for details. Also see expert_covar for an alternative approach to summarising the variance parameters in the presence of expert network covariates.

Vinv

The inverse of the hypervolume parameter for the noise component if required, otherwise set to NULL (see MoE_control).

z

The final responsibility matrix whose [i,k]-th entry is the probability that observation i belonds to the k-th component. If there is a noise component, its values are found in the last column.

classification

The vector of cluster labels for the chosen model corresponding to z, i.e. max.col(z). Observations belonging to the noise component, if any, will belong to component 0.

uncertainty

The uncertainty associated with the classification.

net.covs

A data frame gathering the unique set of covariates used in the gating and expert networks, if any. Will contain zero columns in the absence of gating or expert network covariates. Supplied gating covariates will be excluded if the optimal model has only one component. May have fewer columns than covariates supplied via the network.data argument also, as only the included covariates are gathered here.

resid.data

In the presence of expert network covariates, this is the augmented data actually used in the clustering at convergence, as a list of G matrices of WLS residuals of dimension n * d. Will contain zero columns in the absence of expert network covariates.

DF

A matrix giving the numbers of estimated parameters (i.e. the number of 'used' degrees of freedom) for all visited models, with length{G} rows and length(modelNames) columns. Subtract these numbers from n to get the degrees of freedom. May include missing entries: NA represents models which were not visited, -Inf represents models which were terminated due to error, for which parameters could not be estimated. Inherits the classes "MoECriterion" and "mclustDF", for which dedicated printing and plotting functions exist, respectively.

ITERS

A matrix giving the total number of EM/CEM iterations for all visited models, with length{G} rows and length(modelNames) columns. May include missing entries: NA represents models which were not visited, Inf represents models which were terminated due to singularity/error and thus would never have converged. Inherits the classes "MoECriterion" and "mclustITERS", for which dedicated printing and plotting functions exist, respectively.

Dedicated plot, predict, print, and summary functions exist for objects of class "MoEClust". The results can be coerced to the "Mclust" class to access other functions from the mclust package via as.Mclust.

Details

The function effectively allows 6 different types of Gaussian Mixture of Experts model (as well as the different models in the GPCM/mclust family, for each): i) the standard finite Gaussian mixture with no covariates, ii) fixed covariates only in the gating network, iii) fixed covariates only in the expert network, iv) the full Mixture of Experts model with fixed covariates entering both the mixing proportions and component densities. By constraining the mixing proportions to be equal (see equalPro in MoE_control) two extra special cases are facilitated when gating covariates are excluded.

Note that having the same covariates in both networks is allowed. So too are interactions, transformations, and higher order terms (see formula): the latter must be specified explicitly using the AsIs operator (I). Covariates can be continuous, categorical, logical, or ordinal, but the response must always be continuous.

While model selection in terms of choosing the optimal number of components and the GPCM/mclust model type is performed within MoE_clust, using one of the criterion options within MoE_control, choosing between multiple fits with different combinations of covariates or different initialisation settings can be done by supplying objects of class "MoEClust" to MoE_compare.

References

Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <10.1007/s11634-019-00373-8>.

Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458): 611-631.

Examples

Run this code

# NOT RUN {
data(ais)
hema  <- ais[,3:7]
sex   <- ais$sex
BMI   <- ais$BMI

# Fit a standard finite mixture model
m1    <- MoE_clust(hema, G=2:3)

# Allow covariates to enter the mixing proportions
m2    <- MoE_clust(hema, G=2:3, gating= ~ sex + BMI)

# Allow covariates to enter the component densities
m3    <- MoE_clust(hema, G=2:3, expert= ~ sex)

# Allow covariates to enter both the gating & expert network
m4    <- MoE_clust(hema, G=2:3, gating= ~ BMI, expert= ~ sex)

# Fit an equal mixing proportion model with an expert network covariate
m5    <- MoE_clust(hema, G=2:3, expert= ~ sex + BMI, equalPro=TRUE)

# Fit models with gating covariates & an additional noise component
m6    <- MoE_clust(hema, G=2:3, tau0=0.1, gating= ~ BMI, network.data=ais)

# Extract the model with highest BIC
(comp <- MoE_compare(m1, m2, m3, m4, m5, m6, criterion="bic"))
 
# See if a better model can be found using greedy forward stepwise selection
(step <- MoE_stepwise(ais[,3:7], ais))
(comp <- MoE_compare(comp, step, optimal.only=TRUE))
(best <- comp$optimal)
(summ <- summary(best, classification=TRUE, parameters=TRUE, networks=TRUE))

# Examine the expert network in greater detail
# (but refrain from inferring statistical significance!)
summary(best$expert)

# Visualise the results, incl. the gating network and log-likelihood
plot(best, what="gpairs")
plot(best, what="gating") # equal mixing proportions!
plot(best, what="loglik")

# Visualise the results using the 'lattice' library
require("lattice")
z     <- factor(best$classification, labels=paste0("Cluster", seq_len(best$G)))
splom(~ hema | sex, groups=z)
splom(~ hema | z, groups=sex)
# }

Run the code above in your browser using DataLab