plot.MEDseq: Plot MEDseq results

Description

Produces a range of plots of the results of fitted MEDseq models.

Usage

# S3 method for MEDseq
plot(x,
     type = c("clusters", "mean", "precision", "gating", 
              "bic", "icl", "aic", "dbs", "asw", "cv", 
              "nec", "LOGLIK", "dbsvals", "aswvals", 
              "uncert.bar", "uncert.profile", "loglik", 
              "d", "f", "Ht", "i", "I"), 
     seriated = c("observations", "both", "clusters", "none"), 
     smeth = "TSP",
     quant.scale = FALSE, 
     ...)

Arguments

An object of class "MEDseq" generated by MEDseq_fit or an object of class "MEDseqCompare" generated by MEDseq_compare.

type

A character string giving the type of plot requested:

"clusters": Visualise the data set with sequences grouped into their respective clusters. See seriated. Similar to the type="I" plot (see below).
"mean": Visualise the central sequences. See seriated. The central sequence for the noise component, if any is not shown as it doesn't contribute in any way to the likelihood.
"precision": Visualise the precision parameters in the form of a heatmap. Values of 0 and Inf are shown in "white" and "black" respectively (see quant.scale).
"gating": Visualise the gating network, i.e. the observation index (by default) against the mixing proportions for that observation, coloured by cluster. See seriated. The optional argument x.axis can be passed via the ... construct to change the x-axis against which mixing proportions are plotted (only advisable for models with a single gating network covariate, when x.axis is a quantity related to the gating network of the fitted model).
"bic": Plots all BIC values in a fitted MEDseq object.
"icl": Plots all ICL values in a fitted MEDseq object.
"aic": Plots all AIC values in a fitted MEDseq object.
"dbs": Plots all (weighted) mean/median DBS values in a fitted MEDseq object.
"asw": Plots all (weighted) mean/median ASW values in a fitted MEDseq object.
"cv": Plots all cross-validated log-likelihood values in a fitted MEDseq object.
"nec": Plots all NEC values in a fitted MEDseq object.
"LOGLIK": Plots all maximal log-likelihood values in a fitted MEDseq object.
"dbsvals": Silhouette plot using observations-specific DBS values for the optimal model (coloured by cluster).
"aswvals": Silhouette plot using observations-specific ASW values for the optimal model (coloured by cluster).
"uncert.bar": Plot the observation-specific clustering uncertainties in the form of a bar plot.
"uncert.profile": Plot the observation-specific clustering uncertainties in the form of a profile plot.
"loglik": Plot the log-likelihood at every iteration of the EM/CEM algorithm used to fit the model.

Also available are the following options which act as wrappers to types of plots produced by the seqplot function in the TraMineR package. All are affected by the value of seriated.

"d": State distribution plots (by cluster).
"f": Sequence frequency plots (by cluster).
"Ht": Transversal entropy plots (by cluster).
"i": Selected sequence index plots (by cluster).
"I": Whole set index plots (by cluster). This plot effectively contains the same information as type="clusters", and is similarly affected by the seriated argument, albeit shown on a by-cluster basis rather than stacked in one plot.

seriated

Switch indicating whether seriation should be used to improve the visualisation by re-ordering the "observations" within clusters (the default), the "clusters", "both", or "none". See seriate and the smeth argument below. The "clusters" option (and the cluster-related part of "both") is only invoked when type is one of "clusters", "mean", "precision", "gating", "dbsvals", "aswvals", "d", "f", "Ht", "i", or "I". Additionally, the "observations" option (and the observation-related part of "both") is only invoked when type is one of "clusters", "gating", or "I", which are also the only options for which "both" is relevant.

smeth

A character string with the name of the seriation method to be used. Defaults to "TSP". See seriate and seriation::list_seriation_methods("dist") for further details. Only relevant when seriated != "none".

quant.scale

Logical indicating whether precision parameter heatmaps should use quantiles to determine non-linear colour break-points when type="precision". This ensures each colour represents an equal proportion of the data. The behaviour of 0 or Inf values remains unchanged; only strictly-positive finite entries are affected. Heavily imbalanced values are more likely for the "UU" and "UUN" model types, thus quant.scale defaults to TRUE in those instances and FALSE otherwise. Note that quant.scale is always FALSE for the "CC" and "CCN" model types.

...

Catches unused arguments, and allows arguments to get_MEDseq_results to be passed when type is one of "clusters", "dbsvals", "aswvals", "uncert.bar", "uncert.profile", "d", "f", "Ht", "i", or "I", as well as the x.axis argument when type="gating". Also allows additional arguments to the TraMineR function seqplot to be used.

Value

The visualisation according to type of the results of a fitted MEDseq model.

Details

The type options related to model selection criteria plot values for all fitted models in the "MEDseq" object x. The remaining type options plot results for the optimal model, by default. However, arguments to get_MEDseq_results can be passed via the ... construct to plot corresponding results for suboptimal models in x when type is one of "clusters", "d", "f", "Ht", "i", or "I".

References

Murphy, K., Murphy, T. B., Piccarreta, R., and Gormley, I. C. (2019). Clustering longitudinal life-course sequences using mixtures of exponential-distance models. To appear. <arXiv:1908.07963>.

Gabadinho, A., Ritschard, G., Mueller, N. S., and Studer, M. (2011). Analyzing and visualizing state sequences in R with TraMineR. Journal of Statistical Software, 40(4): 1-37.

Examples

Run this code

# NOT RUN {
# Load the MVAD data
data(mvad)
mvad$Location <- factor(apply(mvad[,5:9], 1L, function(x) 
                 which(x == "yes")), labels = colnames(mvad[,5:9]))
mvad          <- list(covariates = mvad[c(3:4,10:14,87)],
                      sequences = mvad[,15:86], 
                      weights = mvad[,2])
mvad.cov      <- mvad$covariates

# Create a state sequence object with the first two (summer) time points removed
states        <- c("EM", "FE", "HE", "JL", "SC", "TR")
labels        <- c("Employment", "Further Education", "Higher Education", 
                   "Joblessness", "School", "Training")
mvad.seq      <- seqdef(mvad$sequences[-c(1,2)], states=states, labels=labels)

# Fit a range of exponential-distance models without clustering
mod0          <- MEDseq_fit(mvad.seq, G=1)

# Show the central sequence and precision parameters of the optimal model
plot(mod0, type="mean")
plot(mod0, type="precision")
# }
# NOT RUN {
# Fit a range of unweighted mixture models without covariates
# Only consider models with a noise component
# mod1        <- MEDseq_fit(mvad.seq, G=9:11, modtype=c("CCN", "CUN", "UCN", "UUN"))

# Plot the DBS values for all fitted models
# plot(mod1, "dbs")

# Plot the clusters of the optimal model (according to the dbs criterion)
# plot(mod1, "clusters", criterion="dbs")

# Plot the observation-specific ASW values of the best UUN model (according to the asw criterion)
# plot(mod1, "aswvals", modtype="UUN", criterion="asw")

# Fit a model with weights and gating covariates
# mod2        <- MEDseq_fit(mvad.seq, G=10, modtype="UCN", weights=mvad$weights, 
#                           gating=~ fmpr + gcse5eq + livboth, covars=mvad.cov)

# Plot the central sequences & precision parameters of this model
# plot(mod2, "mean")
# plot(mod2, "precision")

# Plot the clustering uncertainties in the form of a barplot
# plot(mod2, "uncert.bar")

# Plot the observation-specific DBS values and the transversal entropies by cluster
# plot(mod2, "dbsvals")
# plot(mod2, "Ht")

# Plot the state-distributions by cluster
# Note that this plot may not display properly in the preview panel
# plot(mod2, "d")
# }

Run the code above in your browser using DataLab