This function does the pairwise comparison for one set of forecasts, but
multiple models involved. It gets called from pairwise_comparison
.
pairwise_comparison
splits the data into arbitrary subgroups specified
by the user (e.g. if pairwise comparison should be done separately for
different forecast targets) and then the actual pairwise comparison for that
subgroup is managed from pairwise_comparison_one_group
. In order to
actually do the comparison between two models over a subset of common
forecasts it calls compare_two_models
.
pairwise_comparison_one_group(
scores,
metric,
test_options,
baseline,
by,
summarise_by
)
A data.frame of unsummarised scores as produced by
eval_forecasts
A character vector of length one with the metric to do the comparison on.
list with options to pass down to compare_two_models
.
To change only one of the default options, just pass a list as input with
the name of the argument you want to change. All elements not included in the
list will be set to the default (so passing an empty list would result in the
default options).
character vector of length one that denotes the baseline model against which to compare other models.
character vector of columns to group scoring by. This should be the
lowest level of grouping possible, i.e. the unit of the individual
observation. This is important as many functions work on individual
observations. If you want a different level of aggregation, you should use
summarise_by
to aggregate the individual scores.
Also not that the pit will be computed using summarise_by
instead of by
character vector of columns to group the summary by. By
default, this is equal to `by` and no summary takes place.
But sometimes you may want to to summarise
over categories different from the scoring.
summarise_by
is also the grouping level used to compute
(and possibly plot) the probability integral transform(pit).