This function post-processes simulations generated by mcmc_IMIFA
for any of the IMIFA family of models. It can be re-ran at little computational cost in order to extract different models explored by the sampler used for sims
, without having to re-run the model itself. New results objects using different numbers of clusters and different numbers of factors (if visited by the model in question), or using different model selection criteria (if necessary) can be generated with ease. The function also performs post-hoc corrections for label switching, as well as post-hoc Procrustes rotation of loadings matrices and scores, to ensure sensible posterior parameter estimates, and constructs credible intervals.
get_IMIFA_results(sims = NULL, burnin = 0L, thinning = 1L, G = NULL,
Q = NULL, criterion = c("bicm", "aicm", "log.iLLH", "dic", "bic.mcmc",
"aic.mcmc"), G.meth = c("mode", "median"), Q.meth = c("mode", "median"),
dat = NULL, conf.level = 0.95, z.avgsim = FALSE, zlabels = NULL)
An object of class "IMIFA
" generated by mcmc_IMIFA
.
Optional additional number of iterations to discard. Defaults to 0, corresponding to no burnin.
Optional interval for extra thinning to be applied. Defaults to 1, corresponding to no thinning.
If this argument is not specified, results will be returned with the optimal number of clusters. If different numbers of clusters were explored in sims
for the "MFA
" or "MIFA
" methods, supplying an integer value allows pulling out a specific solution with G
clusters, even if the solution is sub-optimal. Similarly, this allows retrieval of samples corresponding to a solution, if visited, with G
clusters for the "OMFA
", "OMIFA
", "IMFA
" and "IMIFA
" methods.
If this argument is non specified, results will be returned with the optimal number of factors. If different numbers of factors were explored in sims
for the "FA
", "MFA
", "OMFA
" or "IMFA
" methods, this allows pulling out a specific solution with Q
factors, even if the solution is sub-optimal. Similarly, this allows retrieval of samples corresponding to a solution, if visited, with Q
factors for the "IFA
", "MIFA
", "OMIFA
" and "IMIFA
" methods.
The criterion to use for model selection, where model selection is only required if more than one model was run under the "FA
", "MFA
", "MIFA
", "OMFA
" or "IMFA
" methods when sims
was created via mcmc_IMIFA
. Note that these are all calculated, this argument merely indicates which one will form the basis of the construction of the output. Note that the first three options here might exhibit bias in favour of zero-factor models for the finite factor "FA
", "MFA
", "OMFA
" and "IMFA
" methods and might exhibit bias in favour of one-cluster models for the "MFA
" and "MIFA
" methods.
If the object in sims
arises from the "OMFA
", "OMIFA
", "IMFA
" or "IMIFA
" methods, this argument determines whether the optimal number of clusters is given by the mode or median of the posterior distribution of G
. Defaults to "Mode
".
If the object in sims
arises from the "IFA
", "MIFA
", "OMIFA
" or "IMIFA
" methods, this argument determines whether the optimal number of latent factors is given by the mode or median of the posterior distribution of Q
. Defaults to "Mode
".
The actual data set on which mcmc_IMIFA
was originally run. This is necessary for computing error metrics between the estimated and empirical covariance matrix/matrices. If this is not supplied, the function will attempt to find the data set if it is still available in the global environment.
The confidence level to be used throughout for credible intervals for all parameters of inferential interest. Defaults to 0.95.
Logical indicating whether the clustering should also be summarised with a call to Zsimilarity
by the clustering with minimum squared distance to the similarity matrix obtained by averaging the stored adjacency matrices, in addition to the MAP estimate. Note that the MAP clustering is computed conditional on the estimate of the number of clusters (whether that be the modal estimate or the estimate according to criterion
) and other parameters are extracted conditional on this estimate of G
: however, in constrast, the number of distinct clusters in the summarised labels obtained by z.avgsim=TRUE
may not necessarily coincide with the estimate of G
, but may provide a useful alternative summary of the partitions explored during the chain. Please be warned that this can take considerable time to compute, and may not even be possible if the number of observations &/or number of stored iterations is large and the resulting matrix isn't sufficiently sparse, so the default is FALSE
, otherwise both the summarised clustering and the similarity matrix are stored: the latter can be passed to plot.Results_IMIFA
.
For any method that performs clustering, the true labels can be supplied if they are known in order to compute clustering performance metrics. This also has the effect of ordering the MAP labels (and thus the ordering of cluster-specific parameters) to most closely correspond to the true labels if supplied.
An object of class "Results_IMIFA
" to be passed to plot.Results_IMIFA
for visualising results. Dedicated print
and summary
functions exist for objects of this class. The object, say x
, is a list of lists, the most important components of which are:
Everything pertaining to clustering performance can be found here for all but the "FA
" and "IFA
" methods, in particular x$Clust$map
, the MAP summary of the posterior clustering. More detail is given if known zlabels
are supplied: performance is always evaluated against the MAP clustering, with additional evaluation against the alternative clustering computed if z.avgsim=TRUE
.
Error metrics (e.g. MSE) between the empirical and estimated covariance matrix/matrices.
Everything pertaining to model choice can be found here, incl. posterior summaries for the estimated number of clusters and estimated number of factors, if applicable to the method employed. Information criterion values are also accessible here.
Posterior summaries for the means.
Posterior summaries for the factor loadings matrix/matrices. Posterior mean loadings given by x$Loadings$post.load are given the loadings
class for printing purposes and thus the manner in which they are displayed can be modified.
Posterior summaries for the latent factor scores.
Posterior summaries for the uniquenesses.
Murphy, K., Gormley, I. C. and Viroli, C. (2017) Infinite Mixtures of Infinite Factor Analysers: Nonparametric Model-Based Clustering via Latent Gaussian Models, arXiv:1701.07010.
# NOT RUN {
# data(coffee)
# data(olive)
# Run a MFA model on the coffee data over a range of clusters and factors.
# simMFAcoffee <- mcmc_IMIFA(coffee, method="MFA", range.G=2:3, range.Q=0:3, n.iters=1000)
# Accept all defaults to extract the optimal model.
# resMFAcoffee <- get_IMIFA_results(simMFAcoffee)
# Instead let's get results for a 3-cluster model, allowing Q be chosen by aic.mcmc.
# resMFAcoffee2 <- get_IMIFA_results(simMFAcoffee, G=3, criterion="aic.mcmc")
# Run an IMIFA model on the olive data, accepting all defaults.
# simIMIFAolive <- mcmc_IMIFA(olive, method="IMIFA", n.iters=10000)
# Extract optimum results
# Estimate G & Q by the median of their posterior distributions
# Construct 90% credible intervals and try to return the similarity matrix.
# resIMIFAolive <- get_IMIFA_results(simIMIFAolive, G.meth="median", Q.meth="median",
# conf.level=0.9, z.avgsim=TRUE)
# summary(resIMIFAolive)
# }
Run the code above in your browser using DataLab