selectK.R: Selection of the number $K$ of clusters.

Description

Perform a selection of the number $K$ of clusters for a given subset $S$ of clustering variables.

Usage

selectK.R(xdata, S, Kmax, ploidy = 1, Kmin = 1, emOptions = list(epsi = 1e-05, nberSmallEM = 20, nberIterations = 15, nberMaxIterations = 5000, typeSmallEM = 0, typeEM = 0, putThreshold = FALSE), cte = 1, project = deparse(substitute(xdata)))

Arguments

xdata

A dataset in which data of each variable are in $ploidy$ column(s).

A subset of clustering variables on the form of logical vector of the same length P as the number of variables in xdata.

Kmax

The maximum number of clusters to be explored.

ploidy

The number of occurrences for each variable in the data. For example, $ploidy = 2$ for genotype

Kmin

The minimum number of clusters to be explored. The default value is set to 1.

emOptions

A list of EM options (see EmOptions and setEmOptions).

cte

A double used for the selection criterion named CteDim in which the penalty function is $pen(K,S)=cte*dim$, where dim is the number of free parameters.

project

The name of the project. The default value is the name of the dataset.

Value

A list of estimated paramaters for each selection criteria.

References

Dominique Bontemps and Wilson Toussile (2013) : Clustering and variable selection for categorical multivariate data. Electronic Journal of Statistics, Volume 7, 2344-2371, ISSN.
Wilson Toussile and Elisabeth Gassiat (2009) : Variable selection in model-based clustering using multilocus genotype data. Adv Data Anal Classif, Vol 3, number 2, 109-134.

Examples

Run this code

data(genotype1)
head(genotype1)
genotype2 = cutEachCol(genotype1[, -11], ploidy = 2)
head(genotype2)
S = c(rep(TRUE, 8), rep(FALSE, 2))
## Not run: 
# outPut = selectK.R(genotype2, S, Kmax = 6, ploidy = 2, Kmin=1)
# outPut[["BIC"]]
# 
# file.remove("genotype2_ExploredModels.txt")
# ## End(Not run)

Run the code above in your browser using DataLab