dimJump.R: Data driven calibration of the penalty function
Description
Data driven calibration of the penalty function using the dimension jump version of the "slope heuristics".
Usage
dimJump.R(fileOrData, h = integer(), N = integer(), header = logical())
Arguments
fileOrData
A character string or a data frame (see details). If a data frame, it must contain columns named logLik and dim.
If a file, it must be as the one produced by backward.explorer
h
An integer defining the size of the sliding window used to find the biggest jump.
N
The size of the sample data (number of rows).
header
The indication of whether the file contains header or not.
Value
Assume that the penalty function is in the form $$pen\left(K,S\right) = \alpha*\lambda*dim\left(K,S\right)$$, where
$\lambda$is the penalty parameter to be calibrated,
and$\alpha$a coeffcient belonging to$[1.5,2]$, to be given by the user inmodel.selection.Rfor the final selection.
It returns a list containing two candidate values of $\lambda$ and their bounds. It also produces a graphic that illustrates the "slope heuristics".
Details
This function is a dimension jump version of the so called slope heuristics for the calibration of penalty function using the data.
References
http://projecteuclid.org/euclid.ejs/1379596773{Dominique Bontemps and Wilson Toussile (2013)} : Clustering and
variable selection for categorical multivariate data. Electronic Journal of Statistics, Volume 7, 2344-2371, ISSN.
http://link.springer.com/article/10.1007%2Fs11634-009-0043-x{Wilson Toussile and Elisabeth Gassiat (2009)} : Variable
selection in model-based clustering using multilocus genotype data. Adv Data Anal Classif, Vol 3, number 2, 109-134.
# genotype2_ExploredModels was obtained via backward.explorer.data(genotype2_ExploredModels)
outDimJump = dimJump.R(genotype2_ExploredModels, N = 1000, h = 5, header = TRUE)
outDimJump[[1]]