Learn R Programming

apor (version 0.1.1)

opdRef: OPD Reference Points: Empirical vs Uniform Baselines

Description

Computes two reference values for the Ordinal Prediction Disagreement (OPD): (i) the expected OPD when the predicted label \(\hat Y\) follows the *same* empirical distribution as \(Y\); and (ii) the expected OPD when \(\hat Y\) is *uniform* over the \(k\) ordered categories while \(Y\) retains its empirical distribution. These values are useful as dataset-specific anchors for interpreting raw OPD and for constructing normalized benchmarks.

Usage

opdRef(p)

Value

A named numeric vector of length two:

c(OPDempDist = ..., OPDur = ...).

Arguments

p

A probability vector of length \(k\) giving the empirical distribution of the observed ordinal outcome \(Y\in\{1,\dots,k\}\). Each entry must be nonnegative and the entries must sum to 1.

Details

Let \(p=(p_1,\dots,p_k)\) denote the empirical distribution of \(Y\). The function returns two scalars:

  • OPDempDist: \(\mathbb{E}|\,\hat Y-Y\,|\) when \(\hat Y\sim p\) independently of \(Y\sim p\).

  • OPDur: \(\mathbb{E}|\,\hat Y-Y\,|\) when \(\hat Y\sim \mathrm{Unif}\{1,\dots,k\}\) independently of \(Y\sim p\).

Both are computed via the disagreement-level decomposition $$\mathbb{E}|\,\hat Y-Y\,| = \sum_{d=0}^{k-1} d \;\mathbb{P}(|\hat Y-Y|=d),$$ where, for the uniform case, $$\mathrm{OPD}_{UR}=\frac{1}{k}\sum_{d=0}^{k-1} d\Big[\mathbb{P}\{Y\le k-d\}-\mathbb{P}\{Y\le d\} + \mathbb{P}\{Y\ge d+1\}\Big],$$ which is the discrete-\(\{1,\dots,k\}\) version of the expression shown in the manuscript.

See Also

nopa, ordPredArgmax, ordPredRandom

Examples

Run this code
# Example with k = 5 categories and an empirical distribution p:
p <- c(0.10, 0.20, 0.40, 0.20, 0.10)
opdRef(p)

Run the code above in your browser using DataLab