calcMSCrit: Calculates Model Selection Criteria For Several (Independent) MCMC Runs And Various Numbers $H$ of Clusters

Description

Calculates and plots a set of model selection criteria (depending on the underlying model: e.g. BIC, adjusted BIC, DIC -- Deviance Information Criterion, AWE -- Approximate Weight of Evidence, CLC -- Classification Likelihood Criteria, ICL -- Integrated Classification Likelihood, ICL-BIC) for all estimated models produced by one and the same cluster method (for the sake of comparability) and for various numbers $H$ of clusters/groups and several independent MCMC runs saved in output files located in the specified directory. Therefore several maximisation methods are available. For more information about the criteria see Details, References and references therein.

Usage

calcMSCritMCC(workDir, myLabel = "model choice for ...", H0 = 3, 
          whatToDoList = c("approxMCL", "approxML", "postMode"))
calcMSCritMCCExt(workDir, NN, myLabel = "model choice for ...", 
          ISdraws = 3, H0 = 3, 
          whatToDoList = c("approxMCL", "approxML", "postMode"))
calcMSCritDMC(workDir, myLabel = "model choice for ...", 
          myN0 = "N0 = ...", 
          whatToDoList = c("approxMCL", "approxML", "postMode"))
calcMSCritDMCExt(workDir, myLabel = "model choice for ...", 
          myN0 = "N0 = ...", 
          whatToDoList = c("approxMCL", "approxML", "postMode"))

Arguments

Value

A list containing:postModethe corresponding MSCritTable (see Details), only if whatToDo includes "postMode"approxMLthe corresponding MSCritTable (see Details), only if whatToDo includes "approxML"approxMCLthe corresponding MSCritTable (see Details), only if whatToDo includes "approxMCL"ISdrawsthe number of importance sampling draws for approximating logICL (only for MCCExt)outFileNamesa list (character vector) containing the names of the processed output files (each containing an MCMC run)

Details

For each maximisation method in whatToDoList all (available) model selection criteria are calculated (in an iterative manner). Depending on the entries in this list (whatToDoList) the calculation of (all) these criteria is based on the MCMC draws (iteration) corresponding to the maximum of the log classification likelihood ("approxMCL"), log likelihood ("approxML") and/or (for the sake of completeness) log posterior density ("postMode"). Note, that the user has to decide which criteria are admissible. Which criteria needs which maximisation method? The AWE and the logICL are based on the maximum of the (log) classification likelihood, all the others on the maximum of the (log) likelihood (see References). By the way, it internally calculates the log-likelihood and related values such as LK (observed log-likelihood), CLK (classification or complete log-likelihood), CK (classification-type log-likelihood), EK (entropy term) as well as $d_h$ (number of parameters) which are essential parts of the model selection criteria. We calculate the model prior adjusted BIC using $adjBIC = BIC - 2 H \log(H_0) + 2 log\Gamma(H + 1) + 2 H_0$. According to the used model type the following criteria are calculated: Bic, adjusted Bic, Aic, Awe, IclBic, Clc, Dic2, Dic4 and logICL (see References). Furthermore, plots and tables of selected critera are generated (and plots are also saved in directory workDir). To document the iteration progress, some information is recorded for each output file (containing an MCMC run) -- depending on maximisation method -- like: a running number, maximisation method, number of cluster/groups, BIC, adjusted BIC, AIC, AWE, CLC, IclBic, DIC2, DIC4a, ICL and additionally adj Rand (which compares the starting with the final allocation). For each entry in whatToDo a matrix MSCritTable is produced. Each row represents a processed output file (containing an MCMC run) and the colums contain: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object] For each entry in whatToDo the corresponding MSCritTable is printed together with the current working directory and the content of the current whatToDo. Further, plots of the model selection criteria are produced and saved (with type eps and pdf). If MCCExt is considered also the number of importance sampling draws ISdraws (necessary for logICL) is printed. Additionally, after each iteration the workspace containing the model selection criteria and other stuff is saved to a .RData-file via save.image within directory workDir. Finally, a list containing the names of the processed output files (each containing an MCMC run) is printed.

References

Jeffrey D. Banfield and Adrian E. Raftery, (1993), "Model-Based Gaussian and Non-Gaussian Clustering". Biometrics, Vol. 49, No. 3, pp. 803-821. http://www.jstor.org/stable/2532201 Sylvia Fruehwirth-Schnatter, Christoph Pamminger, Andrea Weber and Rudolf Winter-Ebmer, (2011), "Labor market entry and earnings dynamics: Bayesian inference using mixtures-of-experts Markov chain clustering". Journal of Applied Econometrics. DOI: 10.1002/jae.1249 http://onlinelibrary.wiley.com/doi/10.1002/jae.1249/abstract Sylvia Fruehwirth-Schnatter and Saumyadipta Pyne, (2010), "Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions". Biostatistics, Vol. 11, No. 2, pp. 317-336. DOI: 10.1093/biostatistics/kxp062 http://biostatistics.oxfordjournals.org/content/11/2/317.full.pdf+html Christoph Pamminger and Sylvia Fruehwirth-Schnatter, (2010), "Model-based Clustering of Categorical Time Series". Bayesian Analysis, Vol. 5, No. 2, pp. 345-368. DOI: 10.1214/10-BA606 http://ba.stat.cmu.edu/journal/2010/vol05/issue02/pamminger.pdf

Examples

Run this code

# please run the examples in mcClust, dmClust, mcClustExtended, 
# dmClustExtended

Run the code above in your browser using DataLab