compare_two_models

This function compares two models based on the subset of forecasts for which
both models have made a prediction. It gets called
from <code>pairwise_comparison_one_group()</code>, which handles the
comparison of multiple models on a single set of forecasts (there are no
subsets of forecasts to be distinguished). <code>pairwise_comparison_one_group()</code>
in turn gets called from from <code>pairwise_comparison()</code> which can handle
pairwise comparisons for a set of forecasts with multiple subsets, e.g.
pairwise comparisons for one set of forecasts, but done separately for two
different forecast targets.

internal

Provides a collection of metrics and proper scoring rules
(Tilmann Gneiting & Adrian E Raftery (2007)
<doi:10.1198/016214506000001437>, Jordan, A., Krüger, F., & Lerch, S. (2019)
<doi:10.18637/jss.v090.i12>) within a consistent framework for
evaluation, comparison and visualisation of forecasts.
In addition to proper scoring rules, functions are provided to assess
bias, sharpness and calibration
(Sebastian Funk, Anton Camacho, Adam J. Kucharski, Rachel Lowe, Rosalind
M. Eggo, W. John Edmunds (2019) <doi:10.1371/journal.pcbi.1006785>) of
forecasts.
Several types of predictions (e.g. binary, discrete, continuous) which may
come in different formats (e.g. forecasts represented by predictive samples
or by quantiles of the predictive distribution) can be evaluated.
Scoring metrics can be used either through a convenient data.frame format,
or can be applied as individual functions in a vector / matrix format.
All functionality has been implemented with a focus on performance and is
robustly tested. Find more information about the package in the
accompanying paper (<doi:10.48550/arXiv.2205.07090>).

Nikos Bosse

scoringutils

Utilities for Scoring and Assessing Predictions

Sam Abbott 

Hugo Gruson

Johannes Bracher 

Sebastian Funk

compare_two_models function

<dl><dt>scores</dt>
<dd>A data.table of scores as produced by <code>score()</code>.</dd>
<dt>name_model1</dt>
<dd>character, name of the first model</dd>
<dt>name_model2</dt>
<dd>character, name of the model to compare against</dd>
<dt>metric</dt>
<dd>A character vector of length one with the metric to do the
comparison on. The default is "auto", meaning that either "interval_score",
"crps", or "brier_score" will be selected where available.
See <code>available_metrics()</code> for available metrics.</dd>
<dt>one_sided</dt>
<dd>Boolean, default is <code>FALSE</code>, whether two conduct a one-sided
instead of a two-sided test to determine significance in a pairwise
comparison.</dd>
<dt>test_type</dt>
<dd>character, either "non_parametric" (the default) or
"permutation". This determines which kind of test shall be conducted to
determine p-values.</dd>
<dt>n_permutations</dt>
<dd>numeric, the number of permutations for a
permutation test. Default is 999.</dd></dl>

Arguments

Johannes Bracher, <a href="/link/johannes.bracher%40kit.edu?package=scoringutils&version=1.2.2" data-mini-rdoc="scoringutils::johannes.bracher@kit.edu">johannes.bracher@kit.edu</a>
Nikos Bosse <a href="/link/nikosbosse%40gmail.com?package=scoringutils&version=1.2.2" data-mini-rdoc="scoringutils::nikosbosse@gmail.com">nikosbosse@gmail.com</a>

Author

Compare Two Models Based on Subset of Common Forecasts — compare_two_models

<dl>

<dt>scores</dt>
<dd>A data.table of scores as produced by <code>score()</code>.</dd>


<dt>name_model1</dt>
<dd>character, name of the first model</dd>


<dt>name_model2</dt>
<dd>character, name of the model to compare against</dd>


<dt>metric</dt>
<dd>A character vector of length one with the metric to do the
comparison on. The default is "auto", meaning that either "interval_score",
"crps", or "brier_score" will be selected where available.
See <code>available_metrics()</code> for available metrics.</dd>


<dt>one_sided</dt>
<dd>Boolean, default is <code>FALSE</code>, whether two conduct a one-sided
instead of a two-sided test to determine significance in a pairwise
comparison.</dd>


<dt>test_type</dt>
<dd>character, either "non_parametric" (the default) or
"permutation". This determines which kind of test shall be conducted to
determine p-values.</dd>


<dt>n_permutations</dt>
<dd>numeric, the number of permutations for a
permutation test. Default is 999.</dd>

</dl>

Johannes Bracher, <a href='mailto:johannes.bracher@kit.edu'>johannes.bracher@kit.edu</a>
Nikos Bosse <a href='mailto:nikosbosse@gmail.com'>nikosbosse@gmail.com</a>

compare_two_models: Compare Two Models Based on Subset of Common Forecasts

Description

Usage

Arguments

Author