Obtain several scores for a list of fitted models according to a folding scheme.
cross_validation(
models,
model_names = NULL,
scores = c("mse", "crps", "scrps", "dss"),
cv_type = c("k-fold", "loo", "lpo"),
k = 5,
percentage = 20,
number_folds = 10,
n_samples = 1000,
return_scores_folds = FALSE,
orientation_results = c("negative", "positive"),
include_best = TRUE,
train_test_indexes = NULL,
return_train_test = FALSE,
return_post_samples = FALSE,
return_true_test_values = FALSE,
parallelize_RP = FALSE,
n_cores_RP = parallel::detectCores() - 1,
true_CV = TRUE,
save_settings = FALSE,
print = TRUE,
fit_verbose = FALSE
)
A data.frame with the fitted models and the corresponding scores.
A fitted model obtained from calling the bru()
function or a list of models fitted with the bru()
function.
A vector containing the names of the models to appear in the returned data.frame
. If NULL
, the names will be of the form Model 1
, Model 2
, and so on. By default, it will try to obtain the name from the models list.
A vector containing the scores to be computed. The options are "mse", "crps", "scrps" and "dss". By default, all scores are computed.
The type of the folding to be carried out. The options are k-fold
for k
-fold cross-validation, in which case the parameter k
should be provided,
loo
, for leave-one-out and lpo
for leave-percentage-out, in this case, the parameter percentage
should be given, and also the number_folds
with the number of folds to be done. The default is k-fold
.
The number of folds to be used in k
-fold cross-validation. Will only be used if cv_type
is k-fold
.
The percentage (from 1 to 99) of the data to be used to train the model. Will only be used if cv_type
is lpo
.
Number of folds to be done if cv_type
is lpo
.
Number of samples to compute the posterior statistics to be used to compute the scores.
If TRUE
, the scores for each fold will also be returned.
character vector. The options are "negative" and "positive". If "negative", the smaller the scores the better. If "positive", the larger the scores the better.
Should a row indicating which model was the best for each score be included?
A list containing two entries train
, which is a list whose elements are vectors of indexes of the training data, and test
, which is a list whose elements are vectors of indexes of the test data.
Typically this will be returned list obtained by setting the argument return_train_test
to TRUE
.
Logical. Should the training and test indexes be returned? If 'TRUE' the train and test indexes will the 'train_test' element of the returned list.
If TRUE
the posterior samples will be included in the returned list.
If TRUE
the true test values will be included in the returned list.
Logical. Should the computation of CRPS and SCRPS (and for some cases, DSS) be parallelized?
Number of cores to be used if parallelize_rp
is TRUE
.
Should a TRUE
cross-validation be performed? If TRUE
the models will be fitted on the training dataset. If FALSE
, the parameters will be kept fixed at the ones obtained in the result object.
Logical. If TRUE
, the settings used in the cross-validation will also be returned.
Should partial results be printed throughout the computation?
Should INLA's run during cross-validation be verbose?