Usage
hdda(data, cls, model = "AkjBkQkDk", graph = FALSE, d_select = "Cattell", threshold = 0.2, com_dim = NULL, show = TRUE, scaling = FALSE, cv.dim = 1:10, cv.threshold = c(0.001, 0.005, 0.05, 1:9 * 0.1), cv.vfold = 10, LOO = FALSE, noise.ctrl = 1e-08, d)
Arguments
data
A matrix or a data frame of observations, assuming the rows are the observations and the columns the variables. Note that NAs are not allowed.
cls
The vector of the class of each observations, its type can be numeric or string.
model
A character string vector, or an integer vector indicating the models to be used. The available models are: "AkjBkQkDk" (default), "AkBkQkDk", "ABkQkDk", "AkjBQkDk", "AkBQkDk", "ABQkDk", "AkjBkQkD", "AkBkQkD", "ABkQkD", "AkjBQkD", "AkBQkD", "ABQkD", "AjBQD", "ABQD". It is not case sensitive and integers can be used instead of names, see details for more information. Several models can be used, if it is, only the results of the one which maximizes the BIC criterion is kept. To run all models, use model="ALL".
graph
It is for comparison sake only, when several estimations are run at the same time (either when using several models, or when using cross-validation to select the best dimension/threshold). If graph = TRUE, the plot of the results of all estimations is displayed. Default is FALSE.
d_select
Either Cattell (default) or BIC. See details for more information. This parameter selects which method to use to select the intrinsic dimensions.
threshold
A float stricly within 0 and 1. It is the threshold used in the Cattell's Scree-Test.
com_dim
It is used only for common dimensions models. The user can give the common dimension he wants. If used, it must be an integer. Its default is set to NULL.
show
Use show = FALSE to settle off the informations that may be printed.
scaling
Logical: whether to scale the dataset (mean=0 and standard-error=1 for each variable) or not. By default the data is not scaled.
cv.dim
A vector of integers. Only when d=CV. Gives the dimensions for which the CV is to be done. Note that if some dimensions are greater than what it is possible to have, those are taken off.
cv.threshold
A vector of floats strictly within 0 and 1. Only when d=CV. Gives the thresholds for which the CV is to be done.
cv.vfold
An integer. Only when d=CV. It gives the number of different subsamples in which the dataset is split. If cv.vfold is greater than the number of observations, then the program equalize them.
LOO
If TRUE, it returns results (classes and posterior probabilities) for leave-one-out cross-validation.
noise.ctrl
This parameter avoids to have a too low value of the 'noise' parameter b. It garantees that the dimension selection process do not select too many dimensions (which leads to a potential too low value of the noise parameter b). When selecting the intrinsic dimensions using Cattell's scree-test or BIC, the function doesn't use the eigenvalues inferior to noise.ctrl, so that the intrinsic dimensions selected can't be higher or equal to the order of these eigenvalues.
d
DEPRECATED. This parameter is kept for retro compatibility. Now please use the parameter d_select.