Learn R Programming

ClustMMDD (version 1.0.3)

model.selection.R: Selection of both the number $K$ of clusters and the subset $S$ of clustering variables.

Description

The inference on both the number $K$ of clusters and the subset $S$ of clustering variables is seen as a model selection problem. Each competing model is characterized by one value of $\left(K,S\right)$. The competing models are compared using penalized criteria AIC, BIC, ICL and a more general penalized criterion with a penalty function on the form $$pen\left(K,S\right)=\alpha*\lambda*dim\left(K,S\right),$$ where
  • $\lambda$is a parameter that can be calibrated using "slope-heuristics" (seebackward.explorer,dimJump.R),
  • and$\alpha$is a coefficient in$[1.5, 2]$to be given by the user.

Usage

model.selection.R(fileOrData, cte = as.double(1), alpha = as.double(2.0), header = TRUE,
  lines = integer())

Arguments

fileOrData
A character string or a data frame (see backward.explorer). If fileOrData is a data frame, it must contains a column named logLik and another named $dim$ (see details).
cte
A penalty function parameter. The associated criterion is $-log(likelihood)+cte*dim$.
alpha
A coefficient in $[1.5,2]$. The default value is $2$.
header
Indication of the presence of header in the file.
lines
A vector of integer. If not empty and fileOrData is the name of a file, only models defined in lines are compared.

Value

  • A data frame of the selected models for the proposed penalized criteria.

References

  • http://projecteuclid.org/euclid.ejs/1379596773{Dominique Bontemps and Wilson Toussile (2013)} : Clustering and variable selection for categorical multivariate data. Electronic Journal of Statistics, Volume 7, 2344-2371, ISSN.
  • http://link.springer.com/article/10.1007%2Fs11634-009-0043-x{Wilson Toussile and Elisabeth Gassiat (2009)} : Variable selection in model-based clustering using multilocus genotype data. Adv Data Anal Classif, Vol 3, number 2, 109-134.

See Also

backward.explorer, dimJump.R.

Examples

Run this code
data(genotype2_ExploredModels)
outDimJump = dimJump.R(genotype2_ExploredModels, N = 1000, h = 5, header = TRUE)
cte1 = outDimJump[[1]][1]
outSlection = model.selection.R(genotype2_ExploredModels, cte = cte1, header = TRUE)
outSlection

Run the code above in your browser using DataLab