MoE_stepwise: Stepwise model/variable selection for MoEClust models

Description

Conducts a greedy forward stepwise search to identify the optimal MoEClust model according to some criterion. Components and/or gating covariates and/or expert covariates are added to new MoE_clust fits at each step, while each step is evaluated for all valid modelNames.

Usage

MoE_stepwise(data,
             network.data = NULL,
             gating = NULL,
             expert = NULL,
             modelNames = NULL,
             fullMoE = FALSE,
             noise = FALSE,
             initialModel = NULL,
             initialG = NULL,
             stepG = TRUE,
             criterion = c("bic", "icl", "aic"),
             equalPro = c("all", "both", "yes", "no"),
             noise.gate = c("all", "both", "yes", "no"),
             verbose = interactive(),
             ...)

Value

An object of class "MoECompare" containing information on all visited models and the optimal model (accessible via x$optimal).

Arguments

data

A numeric vector, matrix, or data frame of observations. Categorical variables are not allowed. If a matrix or data frame, rows correspond to observations and columns correspond to variables.

network.data

An optional matrix or data frame in which to look for the covariates specified in the gating &/or expert networks, if any. Must include column names. Columns in network.data corresponding to columns in data will be automatically removed. While a single covariate can be supplied as a vector (provided the `$' operator or `[]' subset operator are not used), it is safer to supply a named 1-column matrix or data frame in this instance.

gating

A vector giving the names of columns in network.data used to define the scope of the gating network. By default, the initial model will contain no covariates (unless initialModel is supplied with gating covariates), thereafter all variables in gating (save for those in initialModel, if any) will be considered for inclusion where appropriate.

If gating is not supplied (or set to NULL), all variables in network.data will be considered for the gating network. gating can also be supplied as NA, in which case no gating network covariates will ever be considered (save for those in initialModel, if any). Supplying gating and expert can be used to ensure different subsets of covariates enter different parts of the model.

expert

A vector giving the names of columns in network.data used to define the scope of the expert network. By default, the initial model will contain no covariates (unless initialModel is supplied with expert covariates), thereafter all variables in expert (save for those in initialModel, if any) will be considered for inclusion where appropriate.

If expert is not supplied (or set to NULL), all variables in network.data will be considered for the expert network. expert can also be supplied as NA, in which case no expert network covariates will ever be considered (save for those in initialModel, if any). Supplying expert and gating can be used to ensure different subsets of covariates enter different parts of the model.

modelNames

A character string of valid model names, to be used to restrict the size of the search space, if desired. By default, all valid model types are explored. Rather than considering the changing of the model type as an additional step, every step is evaluated over all entries in modelNames. See MoE_clust for more details.

Note that if initialModel is supplied (see below), modelNames will be augmented with initialModel$modelName if needs be.

fullMoE

A logical which, when TRUE, ensures that only models where the same covariates enter both parts of the model (the gating and expert networks) are considered. This restricts the search space to exclude models where covariates differ across networks. Thus, the search is likely to be faster, at the expense of potentially missing out on optimal models. Defaults to FALSE.

Furthermore, when TRUE, the set of candidate covariates is automatically taken to be the union of the named covariates in gating and expert, for convenience. In other words, gating=NA will only work if expert=NA also, and both should be set to NULL in order to consider all potential covariates.

In addition, caution is advised using this argument in conjunction with initialModel, which must satisfy the constraint that the same set of covariates be used in both parts of the model, for initial models where gating covariates are allowable. Finally, note that this argument does not preclude a model with only expert covariates included if the number of components is such that the inclusion of gating covariates is infeasible.

noise

A logical indicating whether to assume all models contain an additional noise component (TRUE) or not (FALSE, the default). If initialModel or initialG is not specified, the search starts from a G=0 noise-only model when noise is TRUE, otherwise the search starts from a G=1 model with no covariates when noise is FALSE. See MoE_control for more details. Note, however, that if the model specified in initialModel contains a noise component, the value of the noise argument will be overridden to TRUE; similarly, if the initialModel model does not contain a noise component, noise will be overridden to FALSE.

initialModel

An object of class "MoEClust" generated by MoE_clust or an object of class "MoECompare" generated by MoE_compare. This gives the initial model to use at the first step of the selection algorithm, to which components and/or covariates etc. can be added. Especially useful if the model is expected to have more than one component a priori (see initialG below as an alternative). The initialModel model must have been fitted to the same data in data.

If initialModel is not specified, the search starts from a G=0 noise-only model when noise is TRUE, otherwise the search starts from a G=1 model with no covariates when noise is FALSE. If initialModel is supplied and it contains a noise component, only models with a noise component will be considered thereafter (i.e. the noise argument can be overridden by the initialModel argument). If initialModel contains gating &/or expert covariates, these covariates will be included in all subsequent searches, with covariates in expert and gating still considered as candidates for additional inclusion, as normal.

However, while initialModel can include covariates not specified in gating &/or expert, the initialModel$modelName should be included in the specified modelNames; if it is not, modelNames will be forcibly augmented with initialModel$modelName (as stated above). Furthermore, it is assumed that initialModel is already optimal with respect to the model type. If it is not, the algorithm may be liable to converge to a sub-optimal model, and so a warning will be printed if the function suspects that this might be the case.

initialG

A single (positive) integer giving the number of mixture components (clusters) to initialise the stepwise search algorithm with. This is a simpler alternative to the initialModel argument, to be used when the only prior knowledge relates to the number of components, and not other features of the model (e.g. the covariates which should be included). Consequently, initialG is only relevant when initialModel is not supplied. When neither initialG nor initialModel is specified, the search starts from a G=0 noise-only model when noise is TRUE, otherwise the search starts from a G=1 model with no covariates when noise is FALSE. See stepG below for fixing the number of components at this initialG value.

stepG

A logical indicating whether the algorithm should consider incrementing the number of components at each step. Defaults to TRUE; use FALSE when searching only over configurations with the same number of components is of interest. Setting stepG to FALSE is possible with or without specifying initialModel or initialG, but is primarily intended for use when one of these arguments is supplied, otherwise the algorithm will be stuck forever with only one component.

criterion

The model selection criterion used to determine the optimal action at each step. Defaults to "bic".

equalPro

A character string indicating whether models with equal mixing proportions should be considered. "both" means models with both equal and unequal mixing proportions will be considered, "yes" means only models with equal mixing proportions will be considered, and "no" means only models with unequal mixing proportions will be considered. Notably, no setting for equalPro is enough to rule out models with gating covariates from consideration.

The default ("all") is equivalent to "both" with the addition that all possible mixing proportion constraints will be tried for the initialModel (if any, provided it doesn't contain gating covariate(s)) or initialG before adding a component or additional covariates; otherwise, this equalPro argument only governs whether mixing proportion constraints are considered as components are added.

Considering "all" (or "both") equal and unequal mixing proportion models increases the search space and the computational burden, but this argument becomes irrelevant after a model, if any, with gating network covariate(s) is considered optimal for a given step. The "all" default is strongly recommended so that viable candidate models are not missed out on, particularly when initialModel or initialG are given. However, this does not guarantee that an optimal model will not be skipped; if equalPro is restricted via "yes" or "no", a suboptimal model at one step may ultimately lead to a better final model, in some edge cases. See MoE_control for more details.

noise.gate

A character string indicating whether models where the gating network for the noise component depends on covariates are considered. "yes" means only models where this is the case will be considered, "no" means only models for which the noise component's mixing proportion is constant will be considered and "both" means both of these scenarios will be considered.

The default ("all") is equivalent to "both" with the addition that all possible gating network noise settings will be tried for the initialModel (if any, provided it contains gating covariates and a noise component) before adding a component or additional covariates; otherwise, this noise.gate argument only governs the inclusion/exclusion of this constraint as components or covariates are added.

Considering "all" (or "both") settings increases the search space and the computational burden, but this argument is only relevant when noise=TRUE and gating covariates are being considered. The "all" default is strongly recommended so that viable candidate models are not missed out on, particularly when initialModel or initialG are given. However, this does not guarantee that an optimal model will not be skipped; if noise.gate is restricted via "yes" or "no", a suboptimal model at one step may ultimately lead to a better final model, in some edge cases. See MoE_control for more details.

verbose

Logical indicating whether to print messages pertaining to progress to the screen during fitting. By default is TRUE if the session is interactive, and FALSE otherwise. If FALSE, warnings and error messages will still be printed to the screen, but everything else will be suppressed.

...

Additional arguments to MoE_control, except for those arguments of the same name which are already listed here, e.g. equalPro and noise.gate. Note that these arguments will be supplied to all candidate models for every step. For arguments specific to MoE_control (e.g. stopping, algo, etc.), it is recommended to run MoE_stepwise multiple times while toggling these arguments, if desired.

Author

Keefe Murphy - <keefe.murphy@mu.ie>

Details

The arguments modelNames, equalPro, and noise.gate are provided for computational convenience. They can be used to reduce the number of models under consideration at each stage.

The same is true of the arguments gating and expert, which can each separately (or jointly, if fullMoE is TRUE) be made to consider all variables in network.data, or a subset, or none at all.

Finally, initialModel or initialG can be used to kick-start the search algorithm by incorporating prior information in a more direct way; in the latter case, only in the form of the number of components; in the former case, a full model with a given number of components, certain included gating and expert network covariates, and a certain model type can give the model an even more informed head start. In either case, the stepG argument can be used to fix the number of components and only search over different configurations of covariates.

Without any prior information, it is best to accept the defaults at the expense of a longer run-time.

References

Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <tools:::Rd_expr_doi("10.1007/s11634-019-00373-8")>.

Examples

Run this code

# data(CO2data)
# Search over all models where the single covariate can enter either network
# (mod1  <- MoE_stepwise(CO2data$CO2, CO2data[,"GNP", drop=FALSE]))
#
# data(ais)
# Only look for EVE & EEE models with at most one expert network covariate
# Do not consider any gating covariates and only consider models with equal mixing proportions
# (mod2  <- MoE_stepwise(ais[,3:7], ais, gating=NA, expert="sex",
#                        equalPro="yes", modelNames=c("EVE", "EEE")))
#
# Look for models with noise & only those where the noise component's mixing proportion is constant
# Speed up the search with an initialModel, fix G, and restrict the covariates & model type
# init   <- MoE_clust(ais[,3:7], G=2, modelNames="EEE", 
#                     expert= ~ sex, network.data=ais, tau0=0.1)
# (mod3  <- MoE_stepwise(ais[,3:7], ais, noise=TRUE, expert="sex",
#                        gating=c("SSF", "Ht"), noise.gate="no", 
#                        initialModel=init, stepG=FALSE, modelNames="EEE"))
#
# Compare both sets of results (with & without a noise component) for the ais data
# (comp1 <- MoE_compare(mod2, mod3, optimal.only=TRUE))
# comp1$optimal
#
# Target a model for the AIS data which is optimal in terms of ICL, without any restrictions
# mod4   <- MoE_stepwise(ais[,3:7], ais, criterion="icl")
# 
# This gets stuck at a G=1 model, so specify an initial G value as a head start
# mod5   <- MoE_stepwise(ais[,3:7], ais, criterion="icl", initialG=2)
#
# Check that specifying an initial G value enables a better model to be found
# (comp2 <- MoE_compare(mod4, mod5, optimal.only=TRUE, criterion="icl"))

# Finally, restrict the search to full MoE models only
# Notice that the candidate covariates are the union of gating and expert
# Notice also that the algorithm initially traverses models with only
#   expert covariates when the inclusion of gating covariates is infeasible
# mod6   <- MoE_stepwise(ais[,3:7], ais, fullMoE=TRUE, gating="BMI", expert="Bfat")

Run the code above in your browser using DataLab