opdRef: OPD Reference Points: Empirical vs Uniform Baselines

Description

Computes two reference values for the Ordinal Prediction Disagreement (OPD): (i) the expected OPD when the predicted label $\hat Y$ follows the *same* empirical distribution as $Y$; and (ii) the expected OPD when $\hat Y$ is *uniform* over the $k$ ordered categories while $Y$ retains its empirical distribution. These values are useful as dataset-specific anchors for interpreting raw OPD and for constructing normalized benchmarks.

Usage

opdRef(p)

Value

A named numeric vector of length two:

c(OPDempDist = ..., OPDur = ...).

Arguments

p: A probability vector of length $k$ giving the empirical distribution of the observed ordinal outcome $Y\in\{1,\dots,k\}$. Each entry must be nonnegative and the entries must sum to 1.

Details

Let $p=(p_1,\dots,p_k)$ denote the empirical distribution of $Y$. The function returns two scalars:

OPDempDist: $\mathbb{E}|\,\hat Y-Y\,|$ when $\hat Y\sim p$ independently of $Y\sim p$.
OPDur: $\mathbb{E}|\,\hat Y-Y\,|$ when $\hat Y\sim \mathrm{Unif}\{1,\dots,k\}$ independently of $Y\sim p$.

Both are computed via the disagreement-level decomposition $$\mathbb{E}|\,\hat Y-Y\,| = \sum_{d=0}^{k-1} d \;\mathbb{P}(|\hat Y-Y|=d),$$ where, for the uniform case, $$\mathrm{OPD}_{UR}=\frac{1}{k}\sum_{d=0}^{k-1} d\Big[\mathbb{P}\{Y\le k-d\}-\mathbb{P}\{Y\le d\} + \mathbb{P}\{Y\ge d+1\}\Big],$$ which is the discrete-$\{1,\dots,k\}$ version of the expression shown in the manuscript.

Examples

Run this code

# Example with k = 5 categories and an empirical distribution p:
p <- c(0.10, 0.20, 0.40, 0.20, 0.10)
opdRef(p)