The rsc_cv
function performs cross-validation to estimate the
expected Frobenius loss proposed in Bickel and Levina (2008). The
original contribution of Bickel and Levina (2008), and its extension
in Serra et al. (2018), is based on a random
cross-validation algorithm where the training/test size depends on
the sample size n. The latter is implemented selecting
cv.type = "ramdom"
, and fixing an appropriate number R
of random
train/test splits. R
should be as large as possible, but
in practice this impacts the computing time strongly for
high-dimensional data sets.
Although Serra et al. (2018) showed that the random cross-validation
of Bickel and Levina (2008) works well for the RSC estimator,
subsequent experiments suggested that repeated K-fold cross-validation
on average produces better results. Repeated K-fold cross-validation
is implemented with the default cv.type = "kfold"
. In this case
K
defines the number of folds, while R
defines
the number of times that the K-fold cross-validation is repeated with
R
independent shuffles of the original data. Selecting
R=1
and K=10
one performs the standard 10-fold
cross-validation. Ten replicates (R=10
) of the K-fold
cross-validation are generally sufficient to obtain reasonable
estimates of the underlying loss, but for extremely high-dimensional
data R
may be varied to speed up calculations.
On multi-core hardware the cross-validation is executed in parallel
setting ncores
. The parallelism is implemented on the
total number of data splits, that is R
for the random
cross-validation, and R*K
for the repeated K-fold
cross-validation. The software is optimized so that generally the
total computing time scales almost linearly with the number of
available computer cores (ncores
).
For both the random and the K-fold cross-validation it is computed the
normalized version of the expected squared Frobenius loss proposed in
Bickel and Levina (2008). The normalization is such
that the squared Frobenius norm of the identity matrix equals to 1
whatever is its dimension.
Two optimal threshold selection types are reported with flags (see
Value section below): "minimum"
and
"minimum1se"
. The flag "minimum"
denotes the threshold
value that minimizes the average loss. The flag "minimum1se"
implements the so called
1-SE rule: this is the maximum threshold value such that the
corresponding average loss is within 1-standard-error with
respect to the threshold that minimizes the average loss
(that is the one corresponding to the "minimum"
flag).
Since unbiased standard errors for the K-fold cross-validation are
impossible to compute (see Bengio and Grandvalet, 2004), when
cv.type="kfold"
the reported standard errors have to be
considered as a downward biased approximation.