Learn R Programming

⚠️There's a newer version (2.1.0) of this package.Take me there.

scoringutils: Utilities for Scoring and Assessing Predictions

The scoringutils package provides a collection of metrics and proper scoring rules and aims to make it simple to score probabilistic forecasts against the true observed values. The scoringutils package offers convenient automated forecast evaluation in a data.table format (using the function score()), but also provides experienced users with a set of reliable lower-level scoring metrics operating on vectors/matrices they can build upon in other applications. In addition it implements a wide range of flexible plots designed to cover many use cases.

Where available scoringutils depends on functionality from scoringRules which provides a comprehensive collection of proper scoring rules for predictive probability distributions represented as sample or parametric distributions. For some forecast types, such as quantile forecasts, scoringutils also implements additional metrics for evaluating forecasts. On top of providing an interface to the proper scoring rules implemented in scoringRules and natively, scoringutils also offers utilities for summarising and visualising forecasts and scores, and to obtain relative scores between models which may be useful for non-overlapping forecasts and forecasts across scales.

Predictions can be handled in various formats: scoringutils can handle probabilistic forecasts in either a sample based or a quantile based format. For more detail on the expected input formats please see below. True values can be integer, continuous or binary, and appropriate scores for each of these value types are selected automatically.

Installation

Install the CRAN version of this package using:

install.packages("scoringutils")

Install the stable development version of the package with:

install.packages("scoringutils", repos = "https://epiforecasts.r-universe.dev")

Install the unstable development from GitHub using the following,

remotes::install_github("epiforecasts/scoringutils", dependencies = TRUE)

Quick start

In this quick start guide we explore some of the functionality of the scoringutils package using quantile forecasts from the ECDC forecasting hub as an example. For more detailed documentation please see the package vignettes, and individual function documentation.

Plotting forecasts

As a first step to evaluating the forecasts we visualise them. For the purposes of this example here we make use of plot_predictions() to filter the available forecasts for a single model, and forecast date.

example_quantile %>%
  make_NA(what = "truth", 
          target_end_date >= "2021-07-15", 
          target_end_date < "2021-05-22"
  ) %>%
  make_NA(what = "forecast",
          model != 'EuroCOVIDhub-ensemble', 
          forecast_date != "2021-06-28"
  ) %>%
  plot_predictions(
    x = "target_end_date",
    by = c("target_type", "location")
  ) +
  facet_wrap(target_type ~ location, ncol = 4, scales = "free") 

Scoring forecasts

Forecasts can be easily and quickly scored using the score() function. This function returns unsummarised scores, which in most cases is not what the user wants. Here we make use of additional functions from scoringutils to add empirical coverage-levels (add_coverage()), and scores relative to a baseline model (here chosen to be the EuroCOVIDhub-ensemble model). See the getting started vignette for more details. Finally we summarise these scores by model and target type.

example_quantile %>%
  score() %>%
  add_coverage(ranges = c(50, 90), by = c("model", "target_type")) %>%
  summarise_scores(
    by = c("model", "target_type"),
    relative_skill = TRUE,
    baseline = "EuroCOVIDhub-ensemble"
  ) %>%
  summarise_scores(
    fun = signif, 
    digits = 2
  ) %>%
  kable()
#> The following messages were produced when checking inputs:
#> 1.  144 values for `prediction` are NA in the data provided and the corresponding rows were removed. This may indicate a problem if unexpected.
modeltarget_typeinterval_scoredispersionunderpredictionoverpredictioncoverage_deviationbiasae_mediancoverage_50coverage_90relative_skillscaled_rel_skill
EuroCOVIDhub-baselineCases28000410010000.014000.0-0.1100.0980380000.330.821.301.6
EuroCOVIDhub-baselineDeaths160912.166.00.1200.34002300.661.002.303.8
EuroCOVIDhub-ensembleCases1800037004200.010000.0-0.098-0.0560240000.390.800.821.0
EuroCOVIDhub-ensembleDeaths41304.17.10.2000.0730530.881.000.601.0
UMass-MechBayesDeaths532717.09.0-0.023-0.0220780.460.880.751.3
epiforecasts-EpiNow2Cases2100057003300.012000.0-0.067-0.0790280000.470.790.951.2
epiforecasts-EpiNow2Deaths673216.019.0-0.043-0.00511000.420.910.981.6

scoringutils contains additional functionality to summarise these scores at different levels, to visualise them, and to explore the forecasts themselves. See the package vignettes and function documentation for more information.

Citation

If using scoringutils in your work please consider citing it using the output of citation("scoringutils"):

#> 
#> To cite scoringutils in publications use the following. If you use the
#> CRPS, DSS, or Log Score, please also cite scoringRules.
#> 
#>   Nikos I. Bosse, Hugo Gruson, Sebastian Funk, Anne Cori, Edwin van
#>   Leeuwen, and Sam Abbott (2022). Evaluating Forecasts with
#>   scoringutils in R, arXiv. DOI: 10.48550/ARXIV.2205.07090
#> 
#> To cite scoringRules in publications use:
#> 
#>   Alexander Jordan, Fabian Krueger, Sebastian Lerch (2019). Evaluating
#>   Probabilistic Forecasts with scoringRules. Journal of Statistical
#>   Software, 90(12), 1-37. DOI 10.18637/jss.v090.i12
#> 
#> To see these entries in BibTeX format, use 'print(<citation>,
#> bibtex=TRUE)', 'toBibtex(.)', or set
#> 'options(citation.bibtex.max=999)'.

How to make a bug report or feature request

Please briefly describe your problem and what output you expect in an issue. If you have a question, please don’t open an issue. Instead, ask on our Q and A page.

Contributing

We welcome contributions and new contributors! We particularly appreciate help on priority problems in the issues. Please check and add to the issues, and/or add a pull request.

Code of Conduct

Please note that the scoringutils project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('scoringutils')

Monthly Downloads

1,052

Version

1.1.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Nikos Bosse

Last Published

January 30th, 2023

Functions in scoringutils (1.1.0)

correlation

Correlation Between Metrics
example_continuous

Continuous Forecast Example Data
check_equal_length

Check Length
compare_two_models

Compare Two Models Based on Subset of Common Forecasts
check_predictions

Check Prediction Input For Lower-level Scoring Functions
check_forecasts

Check forecasts
example_quantile_forecasts_only

Quantile Example Data - Forecasts only
crps_sample

Ranked Probability Score
example_point

Point Forecast Example Data
check_true_values

Check Observed Value Input For Lower-level Scoring Functions
example_truth_only

Truth data only
delete_columns

Delete Columns From a Data.table
dss_sample

Dawid-Sebastiani Score
collapse_messages

Collapse several messages to one
check_metrics

Check whether the desired metrics are available in scoringutils
check_not_null

Check Variable is not NULL
check_summary_params

Check input parameters for summarise_scores()
example_integer

Integer Forecast Example Data
example_quantile

Quantile Example Data
get_forecast_unit

Get unit of a single forecast
example_binary

Binary Forecast Example Data
find_duplicates

Find duplicate forecasts
infer_rel_skill_metric

Infer metric for pairwise comparisons
interval_score

Interval Score
metrics

Summary information for selected metrics
logs_binary

Log Score for Binary outcomes
get_target_type

Get type of the target true values of a forecast
pairwise_comparison

Do Pairwise Comparisons of Scores
geom_mean_helper

Calculate Geometric Mean
get_prediction_type

Get prediction type of a forecast
mad_sample

Determine dispersion of a probabilistic forecast
logs_sample

Logarithmic score
plot_avail_forecasts

Visualise Where Forecasts Are Available
pairwise_comparison_one_group

Do Pairwise Comparison for one Set of Forecasts
plot_score_table

Plot Coloured Score Table
plot_ranges

Plot Metrics by Range of the Prediction Interval
make_NA

Make Rows NA in Data for Plotting
merge_pred_and_obs

Merge Forecast Data And Observations
plot_correlation

Plot Correlation Between Metrics
plot_interval_coverage

Plot Interval Coverage
plot_heatmap

Create a Heatmap of a Scoring Metric
plot_predictions

Plot Predictions vs True Values
sample_to_quantile

Change Data from a Sample Based Format to a Quantile Format
range_long_to_quantile

Change Data from a Range Format to a Quantile Format
quantile_score

Quantile Score
quantile_to_range_long

Change Data from a Plain Quantile Format to a Long Range Format
score

Evaluate forecasts
plot_pit

PIT Histogram
plot_wis

Plot Contributions to the Weighted Interval Score
plot_pairwise_comparison

Plot Heatmap of Pairwise Comparisons
print.scoringutils_check

Print output from check_forecasts()
pit_sample

Probability Integral Transformation (sample-based version)
sample_to_range_long

Change Data from a Sample Based Format to a Long Interval Range Format
pit

Probability Integral Transformation (data.frame Format)
permutation_test

Simple permutation test
se_mean_sample

Squared Error of the Mean (Sample-based Version)
squared_error

Squared Error
score_sample

Evaluate forecasts in a Sample-Based Format (Integer or Continuous)
scoringutils-package

scoringutils: Utilities for Scoring and Assessing Predictions
plot_quantile_coverage

Plot Quantile Coverage
summarise_scores

Summarise scores as produced by score()
theme_scoringutils

Scoringutils ggplot2 theme
score_binary

Evaluate forecasts in a Binary Format
score_quantile

Evaluate forecasts in a Quantile-Based Format
abs_error

Absolute Error
available_metrics

Available metrics in scoringutils
bias_sample

Determines bias of forecasts
bias_range

Determines Bias of Quantile Forecasts
ae_median_sample

Absolute Error of the Median (Sample-based Version)
ae_median_quantile

Absolute Error of the Median (Quantile-based Version)
avail_forecasts

Display Number of Forecasts Available
add_coverage

Add coverage of central prediction intervals
brier_score

Brier Score
bias_quantile

Determines Bias of Quantile Forecasts