ClustMMDD : Clustering by Mixture Models for Discrete Data.ClustMMDD stands for "Clustering by Mixture Models for Discrete Data". This package deals with the two-fold problem of variable selection and model-based unsupervised classification in discrete settings. Variable selection and classification are simultaneously solved via a model selection procedure using penalized criteria: Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Integrated Completed Log-likelihood (ICL) or a general criterion with penalty function to be data-driven calibrated. In this package, K and S are respectively the number of clusters and the subset of variables that are relevant for clustering purposes. We assume that a clustering variable has different probability distributions in at least two clusters, and a non-clustering variable has the same distribution in all clusters. We consider a general situation with data described by $P$ random variables $X^l$, $l=1,\cdots,P$, where each variable $X^l$ is an unordered set $\left{X^{l,1},\cdots,X^{l,ploidy}\right}$ of $ploidy$ categorical variables. For all $l$, the random variables $X^{l,1},\cdots,X^{l,ploidy}$ take their values in the same set of levels. A typical example of such data comes from population genetics where each genotype of a diploid individual is constituted by $ploidy = 2$ unordered alleles.
The two-fold problem of clustering and variable selection is seen as a model selection problem. A specific collection of competing models associated to different values of (K, S) is defined, and are compared using penalized criteria. The penalized criteria are of the form $$crit\left(K,S\right)=\gamma_n\left(K,S\right)+pen\left(K,S\right),$$ where
data(genotype2)
head(genotype2)
data(genotype2_ExploredModels)
head(genotype2_ExploredModels)
#Calibration of the penalty function
outDimJump = dimJump.R(genotype2_ExploredModels, N = 1000, h = 5, header = TRUE)
cte1 = outDimJump[[1]][1]
outSlection = model.selection.R(genotype2_ExploredModels, cte = cte1, header = TRUE)
outSlectionRun the code above in your browser using DataLab