This function compares two models based on the subset of forecasts for which
both models have made a prediction. It gets called
from pairwise_comparison_one_group(), which handles the
comparison of multiple models on a single set of forecasts (there are no
subsets of forecasts to be distinguished). pairwise_comparison_one_group()
in turn gets called from from pairwise_comparison() which can handle
pairwise comparisons for a set of forecasts with multiple subsets, e.g.
pairwise comparisons for one set of forecasts, but done separately for two
different forecast targets.
compare_two_models(
  scores,
  name_model1,
  name_model2,
  metric,
  one_sided = FALSE,
  test_type = c("non_parametric", "permutation"),
  n_permutations = 999
)A data.table of scores as produced by score().
character, name of the first model
character, name of the model to compare against
A character vector of length one with the metric to do the
comparison on. The default is "auto", meaning that either "interval_score",
"crps", or "brier_score" will be selected where available.
See available_metrics() for available metrics.
Boolean, default is FALSE, whether two conduct a one-sided
instead of a two-sided test to determine significance in a pairwise
comparison.
character, either "non_parametric" (the default) or "permutation". This determines which kind of test shall be conducted to determine p-values.
numeric, the number of permutations for a permutation test. Default is 999.
Johannes Bracher, johannes.bracher@kit.edu
Nikos Bosse nikosbosse@gmail.com