interval_width: Mean width of prediction intervals

Description

Computes the mean width of prediction intervals, defined as the average difference between upper and lower bounds.

Usage

interval_width(
  lower_bound = NULL,
  upper_bound = NULL,
  intervals = NULL,
  return_vector = FALSE,
  na.rm = FALSE
)

Value

A single numeric value representing the mean width of the prediction intervals.

Arguments

lower_bound: A numeric vector of lower bounds of the prediction intervals.
upper_bound: A numeric vector of upper bounds of the prediction intervals.
intervals: Alternative input for prediction intervals as a list-column, where each element is a list with components 'lower_bound' and 'upper_bound'. Useful with non-contigous intervals, for instance constructed using the bin conditional conformal method wich can yield multiple intervals per prediction. See details.
return_vector: Logical, whether to return the width vector (TRUE) or the mean width (FALSE). Default is FALSE.
na.rm: Logical, whether to remove NA values before calculation. Default is FALSE.

Details

The mean width is calculated as: $$ \text{Mean Width} = \frac{1}{n} \sum_{i=1}^{n} (ub_i - lb_i) $$

where $ ub_i $ and $ lb_i $ are the upper and lower bounds of the prediction interval for observation $ i $, and $ n $ is the total number of observations.

If the `intervals` argument is provided, it should be a list-column where each element is a list containing 'lower_bound' and 'upper_bound' vectors. This allows for the calculation of coverage for non-contiguous intervals, such as those produced by certain conformal prediction methods such as the bin conditional conformal method. In this case, coverage is determined by checking if the true value falls within any of the specified intervals for each observation. If the user has some observations with contiguous intervals and others with non-contiguous intervals, they can provide both `lower_bound` and `upper_bound` vectors along with the `intervals` list-column. The function will compute coverage accordingly for each observation based on the available information.

Examples

Run this code

library(dplyr)
library(tibble)

# Simulate example data
set.seed(123)
x1 <- runif(1000)
x2 <- runif(1000)
y <- rnorm(1000, mean = x1 + x2, sd = 1)
df <- tibble(x1, x2, y)

# Split into training, calibration, and test sets
df_train <- df %>% slice(1:500)
df_cal <- df %>% slice(501:750)
df_test <- df %>% slice(751:1000)

# Fit a model on the log-scale
mod <- lm(y ~ x1 + x2, data = df_train)

# Generate predictions
pred_cal <- predict(mod, newdata = df_cal)
pred_test <- predict(mod, newdata = df_test)

# Estimate normal prediction intervals from calibration data
intervals <- pinterval_parametric(
  pred = pred_test,
  calib = pred_cal,
  calib_truth = df_cal$y,
  dist = "norm",
  alpha = 0.1
)

# Calculate empirical coverage
interval_width(lower_bound = intervals$lower_bound,
         upper_bound = intervals$upper_bound)

Run the code above in your browser using DataLab