interval_miscoverage: Empirical miscoverage of prediction intervals

Description

Calculates the empirical miscoverage rate of prediction intervals, i.e., the difference between proportion of true values that fall within their corresponding prediction intervals and the nominal coverage rate (1 - alpha).

Usage

interval_miscoverage(truth, lower_bound, upper_bound, alpha, na.rm = FALSE)

Value

A single numeric value between -1 and 1 representing the empirical miscoverage rate. A value close to 0 indicates that the prediction intervals are well-calibrated.

Arguments

truth: A numeric vector of true outcome values.
lower_bound: A numeric vector of lower bounds of the prediction intervals.
upper_bound: A numeric vector of upper bounds of the prediction intervals.
alpha: The nominal miscoverage rate (e.g., 0.1 for 90% prediction intervals).
na.rm: Logical, whether to remove NA values before calculation. Default is FALSE.

Examples

Run this code

library(dplyr)
library(tibble)

# Simulate example data
set.seed(123)
x1 <- runif(1000)
x2 <- runif(1000)
y <- rnorm(1000, mean = x1 + x2, sd = 1)
df <- tibble(x1, x2, y)

# Split into training, calibration, and test sets
df_train <- df %>% slice(1:500)
df_cal <- df %>% slice(501:750)
df_test <- df %>% slice(751:1000)

# Fit a model on the log-scale
mod <- lm(y ~ x1 + x2, data = df_train)

# Generate predictions
pred_cal <- predict(mod, newdata = df_cal)
pred_test <- predict(mod, newdata = df_test)

# Estimate normal prediction intervals from calibration data
intervals <- pinterval_parametric(
  pred = pred_test,
  calib = pred_cal,
  calib_truth = df_cal$y,
  dist = "norm",
  alpha = 0.1
)

# Calculate empirical coverage
interval_miscoverage(truth = df_test$y,
         lower_bound = intervals$lower_bound,
         upper_bound = intervals$upper_bound,
         alpha = 0.1)

Run the code above in your browser using DataLab