plot.MEDseq: Plot MEDseq results

Description

Produces a range of plots of the results of fitted MEDseq models.

Usage

# S3 method for MEDseq
plot(x,
     type = c("clusters", "central", "precision", "gating", 
              "bic", "icl", "aic", "dbs", "asw", "cv", 
              "nec", "LOGLIK", "dbsvals", "aswvals", "similarity",
              "uncert.bar", "uncert.profile", "loglik", 
              "d", "dH", "f", "Ht", "i", "I", "ms", "mt"), 
     seriated = c("observations", "both", "clusters", "none"), 
     soft = NULL,
     weighted = TRUE,
     SPS = NULL,
     smeth = "TSP",
     sortv = NULL,
     subset = NULL,
     quant.scale = FALSE, 
     ...)

Value

The visualisation according to type of the results of a fitted MEDseq model.

Arguments

x

An object of class "MEDseq" generated by MEDseq_fit or an object of class "MEDseqCompare" generated by MEDseq_compare.

type

A character string giving the type of plot requested:

"clusters": Visualise the data set with sequences grouped into their respective clusters. See seriated. Similar to the type="I" plot (see below). However, type="clusters" always plots the hard MAP partition and is unaffected by the soft argument below.

"central"

Visualise the central sequences (typically modal sequences, but this depends on the opti argument to MEDseq_control used during model-fitting). See seriated. The central sequence for the noise component, if any, is not shown as it doesn't contribute in any way to the likelihood. See the type="ms" option below for an alternative means of displaying the central sequences.

"precision"

Visualise the precision parameters in the form of a heatmap. Values of 0 and Inf are shown in "white" and "black" respectively (see quant.scale and seriated).

"gating"

Visualise the gating network, i.e. the observation index (by default) against the mixing proportions for that observation, coloured by cluster. Such plots can be produced with or without the gating network actually having had covariates included during model-fitting. See seriated, but note that this argument is only relevant for models with gating network covariates, provided x.axis is not supplied. The optional argument x.axis can be passed via the ... construct to change the x-axis against which mixing proportions are plotted (only advisable for models with a single gating network covariate, when x.axis is a quantity related to the gating network of the fitted model).

"bic"

Plots all BIC values in a fitted MEDseq object.

"icl"

Plots all ICL values in a fitted MEDseq object.

"aic"

Plots all AIC values in a fitted MEDseq object.

"dbs"

Plots all (weighted) mean/median DBS criterion values in a fitted MEDseq object.

"asw"

Plots all (weighted) mean/median ASW criterion values in a fitted MEDseq object.

"cv"

Plots all cross-validated log-likelihood values in a fitted MEDseq object.

"nec"

Plots all NEC values in a fitted MEDseq object.

"LOGLIK"

Plots all maximal log-likelihood values in a fitted MEDseq object.

"dbsvals"

Silhouette plot using observations-specific DBS values for the optimal model (coloured by cluster). See seriated.

"aswvals"

Silhouette plot using observations-specific ASW values for the optimal model (coloured by cluster). See seriated.

"similarity"

Produces a heatmap of the similarity matrix constructed from the x$z matrix at convergence, with observations reordered via seriated for visual clarity. The (potentially seriated) similarity matrix can also be invisibly returned.

"uncert.bar"

Plot the observation-specific clustering uncertainties, if any, in the form of a bar plot.

"uncert.profile"

Plot the observation-specific clustering uncertainties, if any, in the form of a profile plot.

"loglik"

Plot the log-likelihood at every iteration of the EM/CEM algorithm used to fit the model.

Also available are the following options which act as wrappers to types of plots produced by the seqplot function in the TraMineR package. All are affected by the value of seriated and all account for the sampling weights (if any) by default (see the weighted argument and the related Note below).

Note also that all of the plot types below can be made to either work with the hard MAP partition (as per seqplot), or to use the soft cluster membership probabilities, via the soft argument below. The soft information is used by default for all but the "i" and "I" plot types, which (by default) discard this information to instead use the MAP partition: see the soft argument below for modifying this default behaviour for all of the following plot types.

"d": State distribution plots (chronograms, by cluster).
"dH": State distribution plots (chronograms, by cluster) with overlaid entropy line as per type="Ht". Note that this option is only available if version 2.2-4 or later of TraMineR is installed.
"f": Sequence frequency plots (by cluster).
"Ht": Transversal entropy plots (by cluster).
"i": Selected sequence index plots (by cluster). By default, bar widths for each observation will be proportional to their weight (if any). However, this can be overruled by specifying weighted=FALSE.
"I": Whole set index plots (by cluster). This plot effectively contains almost exactly the same information as type="clusters" plots, and is similarly affected by the seriated argument, albeit shown on a by-cluster basis rather than stacked in one plot. However, bar widths for each observation will (by default) be proportional to their weight (if any), which is not the case for type="clusters" plots. However, this can be overruled by specifying weighted=FALSE.
"ms": Modal state sequence plots (by cluster). This is an alternative way of displaying the central sequences beyond the type="central" option above. Notably, this option respects arguments passed to get_MEDseq_results via the ... construct (see below), while type="central" does not, although still nothing is shown for the noise component. Note: unlike type="central", this option always plots modal sequences, even if another opti setting was invoked during model-fitting via MEDseq_control, in which case there will be a mismatch between the visualisation and x$params$theta. Similarly, there may be a mismatch if soft and/or weighted are modified from their default values of TRUE.
"mt": Mean times plots (by cluster). This is equivalent to plotting the results of MEDseq_meantime(x, MAP=!soft, weighted=weighted, norm=TRUE, prop=FALSE, map.size=FALSE, wt.size=TRUE). Other options for norm=FALSE, prop=TRUE, map.size=TRUE, and wt.size=FALSE may be added in future versions of this package.

seriated

Switch indicating whether seriation should be used to improve the visualisation by re-ordering the "observations" within clusters (the default), the "clusters", "both", or "none". See seriate and the smeth and sortv arguments below.

The "clusters" option (and the cluster-related part of "both") is only invoked when type is one of "clusters", "central", "precision", "gating", "dbsvals", "aswvals", "similarity", "d", "dH", "f", "Ht", "i", "I", "ms", or "mt" and the model has more than one component.

Additionally, the "observations" option (and the observation-related part of "both") is only invoked when type is one of "clusters", "gating", "similarity", "i" or "I", which are also the only options for which "both" is relevant.

Though all seriated options can be specified when type is "gating", they are only invoked and relevant when the model actually contains gating network covariates and x.axis is not supplied via the ... construct.

soft

This argument is a single logical indicator which is only relevant for the "d", "dH", "f", "Ht", "i", "I", "ms", and "mt" plot types borrowed from TraMineR. When soft=TRUE (the default for all but the "i" and "I" type plots) the soft cluster membership probabilities are used in a manner akin to fuzzyseqplot. Otherwise, when FALSE (the default for "i" and "I" type plots), the soft information is discarded and the hard MAP partition is used instead.

Note that soft cluster membership probabilities will not be available if x$G=1 or the model was fitted using the algo="CEM" option to MEDseq_control. Plots may still be weighted when soft is FALSE, according to the observation-specific sampling weights, when weighted=TRUE. Note also that type="Ht" can be used in conjunction with soft=TRUE, unlike fuzzyseqplot for which type="Ht" is not permissible. Finally, be advised that plotting may be time-consuming when soft=TRUE for "i" and "I" type plots.

weighted

This argument is a single logical indicator which is only relevant for the "clusters", "central", and "precision" plot types, as well as the "d", "dH", "f", "Ht", "i", "I", "ms", and "mt" plot types borrowed from TraMineR. For plots borrowed from TraMineR, when TRUE (the default), the weights (if any) are accounted for in such plots. Note that when soft is TRUE, plots will still be weighted according to the soft cluster membership probabilities; thus weighted=TRUE and soft=TRUE allows both these and the observation-specific weights to be used simultaneously (the default behaviour for both arguments).

Additionally, for these plots and the "clusters", "central", and "precision" types, weighted is passed through to MEDseq_clustnames in the rare case where SPS=TRUE (see below) and the optional MEDseq_clustnames argument size=TRUE is invoked (again, see below).

SPS

A logical indicating whether clusters should be labelled according to the state-permanence-sequence representation of their central sequence. See MEDseq_clustnames and seqformat. Defaults to TRUE for the plot types adapted from TraMineR, i.e. the "d", "dH", "f", "Ht", "i", "I", "ms", and "mt" type plots. The SPS argument is also relevant for the following type plots: "clusters", "central", and "precision", though SPS defaults to FALSE in those instances. Note that if SPS=TRUE for any relevant plot type, the weighted argument above is relevant if the optional MEDseq_clustnames argument size=TRUE is invoked (see below).

smeth

A character string with the name of the seriation method to be used. Defaults to "TSP". See seriate and seriation::list_seriation_methods("dist") for further details and the available methods. Only relevant when seriated != "none". When seriated == "obs" or seriated == "both", the ordering of observations can be governed by smeth or instead governed by the sortv argument below, but the ordering of clusters (when seriated="clusters" or seriated="both") is always governed by smeth.

sortv

A sorting method governing the ordering of observations for "clusters", "gating", "similarity", "i", or "I" type plots. Potential options include "dbs" and "asw", for sorting observations by their DBS or ASW values (if available), as well as "from.start" and "from.end" (only when type is "clusters", "i", or "I"), under which sequences are sorted by the elements of the alphabet at the successive positions starting from the start/end of the sequences (as per TraMineR). Only relevant if seriated is one of "observations" or "both". Note that the sortv argument overrides the setting in smeth as it pertains to the ordering of observations if sortv is supplied; otherwise sortv is NULL and smeth is invoked. Note that smeth always dictates the ordering of clusters (i.e. when seriated="clusters" or seriated="both").

Additionally, when (and only when) soft=TRUE and type="I", the additional option sortv="membership" is provided in accordance with fuzzyseqplot, on which such plots are based.

subset

An optional numeric vector giving the indices of the clusters to be plotted. For models with a noise component, values in 0:x$G are admissible, where 0 denotes the noise component, otherwise only values in 1:x$G. Only relevant for the TraMineR-type plots, i.e. "d", "dH", "f", "Ht", "i", "I", "ms", and "mt" type plots. Note however, that noise components are never plotted for type="ms" plots, so subset values of 0 will be ignored in this instance.

quant.scale

Logical indicating whether precision parameter heatmaps should use quantiles to determine non-linear colour break-points when type="precision". This ensures each colour represents an equal proportion of the data. The behaviour of 0 or Inf values remains unchanged; only strictly-positive finite entries are affected. Heavily imbalanced values are more likely for the "UU" and "UUN" model types, thus quant.scale defaults to TRUE in those instances and FALSE otherwise. Note that quant.scale is always FALSE for the "CC" and "CCN" model types.

...

Catches unused arguments, and allows arguments to get_MEDseq_results to be passed when type is one of "clusters", "dbsvals", "aswvals", "similarity", "uncert.bar", "uncert.profile", "d", "dH", "f", "Ht", "i", "I", "ms", or "mt", as well as the x.axis argument when type="gating". Also allows select additional arguments to the TraMineR function seqplot to be used for the relevant plot types (e.g. border, yaxis and/or ylab, serr where type="mt", and info where type="ms") and the size argument to MEDseq_clustnames, where relevant.

Author

Keefe Murphy - <keefe.murphy@mu.ie>

Details

The type options related to model selection criteria plot values for all fitted models in the "MEDseq" object x. The remaining type options plot results for the optimal model, by default. However, arguments to get_MEDseq_results can be passed via the ... construct to plot corresponding results for suboptimal models in x when type is one of "clusters", "d", "dH", "f", "Ht", "i", "I", "ms", or "mt". See the examples below.

References

Murphy, K., Murphy, T. B., Piccarreta, R., and Gormley, I. C. (2021). Clustering longitudinal life-course sequences using mixtures of exponential-distance models. Journal of the Royal Statistical Society: Series A (Statistics in Society), 184(4): 1414-1451. <tools:::Rd_expr_doi("10.1111/rssa.12712")>.

Studer, M. (2018). Divisive property-based and fuzzy clustering for sequence analysis. In G. Ritschard and M. Studer (Eds.), Sequence Analysis and Related Approaches: Innovative Methods and Applications, Volume 10 of Life Course Research and Social Policies, pp. 223-239. Cham, Switzerland: Springer.

Gabadinho, A., Ritschard, G., Mueller, N. S., and Studer, M. (2011). Analyzing and visualizing state sequences in R with TraMineR. Journal of Statistical Software, 40(4): 1-37.

Examples

Run this code

if (FALSE) { # interactive()
# Load the MVAD data
data(mvad)
mvad$Location <- factor(apply(mvad[,5:9], 1L, function(x) 
                 which(x == "yes")), labels = colnames(mvad[,5:9]))
mvad          <- list(covariates = mvad[c(3:4,10:14,87)],
                      sequences = mvad[,15:86], 
                      weights = mvad[,2])
mvad.cov      <- mvad$covariates

# Create a state sequence object with the first two (summer) time points removed
states        <- c("EM", "FE", "HE", "JL", "SC", "TR")
labels        <- c("Employment", "Further Education", "Higher Education", 
                   "Joblessness", "School", "Training")
mvad.seq      <- seqdef(mvad$sequences[-c(1,2)], states=states, labels=labels)

# Fit a range of exponential-distance models without clustering
mod0          <- MEDseq_fit(mvad.seq, G=1)

# Show the central sequence and precision parameters of the optimal model
plot(mod0, type="central")
plot(mod0, type="ms")
plot(mod0, type="precision")
# \donttest{
# Fit a range of unweighted mixture models without covariates
# Only consider models with a noise component
# mod1        <- MEDseq_fit(mvad.seq, G=9:11, modtype=c("CCN", "CUN", "UCN", "UUN"))

# Plot the DBS values for all fitted models
# plot(mod1, "dbs")

# Plot the clusters of the optimal model (according to the dbs criterion)
# plot(mod1, "clusters", criterion="dbs")

# Use seriation to order the observations and the clusters
# plot(mod1, "cluster", criterion="dbs", seriated="both")

# Use a different seriation method
# seriation::list_seriation_methods("dist")
# plot(mod1, "cluster", criterion="dbs", seriated="both", smeth="Spectral")

# Use the DBS values instead to sort the observations, and label the clusters
# plot(mod1, "cluster", criterion="dbs", seriated="both", sortv="dbs", SPS=TRUE, size=TRUE)

# Plot the observation-specific ASW values of the best CCN model (according to the asw criterion)
# plot(mod1, "aswvals", modtype="CCN", criterion="asw")

# Plot the similarity matrix (as a heatmap) of the best G=9 model (according to the icl criterion)
# plot(mod1, "similarity", G=9, criterion="icl")

# Fit a model with weights and gating covariates
# mod2        <- MEDseq_fit(mvad.seq, G=10, modtype="UCN", weights=mvad$weights, 
#                           gating=~ fmpr + gcse5eq + livboth, covars=mvad.cov)

# Plot the central sequences & precision parameters of this model
# plot(mod2, "central")
# plot(mod2, "precision")

# Plot the clustering uncertainties in the form of a barplot
# plot(mod2, "uncert.bar")

# Plot the observation-specific DBS values
# plot(mod2, "dbsvals")

# Plot the  transversal entropies by cluster & then the state-distributions by cluster
# Note that these plots may not display properly in the preview panel
# plot(mod2, "Ht", ylab=NA)              # suppress the y-axis labels
# plot(mod2, "d", border=TRUE)           # add borders
# plot(mod2, "dH", ylab=NA, border=TRUE) # both simultaneously (needs TraMineR >=2.2-4)

# The plots above use the soft cluster membership probabilities
# Discard this information and reproduce the per-cluster state-distributions plot
# plot(mod2, "d", soft=FALSE)

# The plots above use the observation-specific sampling weights
# Discard this information and plot the mean times per state per cluster
# plot(mod2, "mt", weighted=FALSE)

# Use type="I" and subset=0 to examine the noise component
# plot(mod2, "I", subset=0, border=TRUE, weighted=FALSE, seriated="none")# }
}

Run the code above in your browser using DataLab