pinterval_parametric: #' Parametric prediction intervals for continuous predictions

Description

This function computes parametric prediction intervals at a confidence level of \(1 - \alpha\) for a vector of continuous predictions. The intervals are based on a user-specified probability distribution and associated parameters, either estimated from calibration data or supplied directly. Supported distributions include common options like the normal, log-normal, gamma, beta, and negative binomial, as well as any user-defined distribution with a quantile function. Prediction intervals are calculated by evaluating the appropriate quantiles for each predicted value.

Usage

pinterval_parametric(
  pred,
  calib = NULL,
  calib_truth = NULL,
  dist = c("norm", "lnorm", "exp", "pois", "nbinom", "gamma", "chisq", "logis", "beta"),
  pars = list(),
  alpha = 0.1
)

Value

A tibble with the predicted values and the lower and upper bounds of the prediction intervals

Arguments

pred: Vector of predicted values
calib: A numeric vector of predicted values in the calibration partition, or a 2 column tibble or matrix with the first column being the predicted values and the second column being the truth values. If calib is a numeric vector, calib_truth must be provided.
calib_truth: A numeric vector of true values in the calibration partition. Only required if calib is a numeric vector
dist: Distribution to use for the prediction intervals. Can be a character string matching any available distribution in R or a function representing a distribution, e.g. `qnorm`, `qgamma`, or a user defined quantile function. Default options are 'norm', 'lnorm','exp, 'pois', 'nbinom', 'chisq', 'gamma', 'logis', and 'beta' for which parameters can be computed from the calibration set. If a custom function is provided, parameters need to be provided in `pars`.
pars: List of named parameters for the distribution for each prediction. Not needed if calib is provided and the distribution is one of the default options. If a custom distribution function is provided, this list should contain the parameters needed for the quantile function, with names matching the corresponding arguments for the parameter names of the distribution function. See details for more information.
alpha: The confidence level for the prediction intervals. Must be a single numeric value between 0 and 1

Details

This function supports a wide range of distributions for constructing prediction intervals. Built-in support is provided for the following distributions: `"norm"`, `"lnorm"`, `"exp"`, `"pois"`, `"nbinom"`, `"chisq"`, `"gamma"`, `"logis"`, and `"beta"`. For each of these, parameters can be automatically estimated from a calibration set if not supplied directly via the `pars` argument.

The calibration set (`calib` and `calib_truth`) is used to estimate error dispersion or shape parameters. For example: - **Normal**: standard deviation of errors - **Log-normal**: standard deviation of log-errors - **Gamma**: dispersion via `glm` - **Negative binomial**: dispersion via `glm.nb()` - **Beta**: precision estimated from error variance

If `pars` is supplied, it should be a list of named arguments corresponding to the distribution’s quantile function. Parameters may be scalars or vectors (one per prediction). When both `pars` and `calib` are provided, the values in `pars` are used.

Users may also specify a custom distribution by passing a quantile function directly (e.g., a function with the signature `function(p, ...)`) as the `dist` argument, in which case `pars` must be provided explicitly.

Examples

Run this code

library(dplyr)
library(tibble)

# Simulate example data
set.seed(123)
x1 <- runif(1000)
x2 <- runif(1000)
y <- rlnorm(1000, meanlog = x1 + x2, sdlog = 0.5)
df <- tibble(x1, x2, y)

# Split into training, calibration, and test sets
df_train <- df %>% slice(1:500)
df_cal <- df %>% slice(501:750)
df_test <- df %>% slice(751:1000)

# Fit a model on the log-scale
mod <- lm(log(y) ~ x1 + x2, data = df_train)

# Generate predictions
pred_cal <- exp(predict(mod, newdata = df_cal))
pred_test <- exp(predict(mod, newdata = df_test))

# Estimate log-normal prediction intervals from calibration data
log_resid_sd <- sqrt(mean((log(pred_cal) - log(df_cal$y))^2))
pinterval_parametric(
  pred = pred_test,
  dist = "lnorm",
  pars = list(meanlog = log(pred_test), sdlog = log_resid_sd)
)

# Alternatively, use calibration data directly to estimate parameters
pinterval_parametric(
  pred = pred_test,
  calib = pred_cal,
  calib_truth = df_cal$y,
  dist = "lnorm"
)

# Use the normal distribution with direct parameter input
norm_sd <- sqrt(mean((pred_cal - df_cal$y)^2))
pinterval_parametric(
  pred = pred_test,
  dist = "norm",
  pars = list(mean = pred_test, sd = norm_sd)
)

# Use the gamma distribution with parameters estimated from calibration data
pinterval_parametric(
  pred = pred_test,
  calib = pred_cal,
  calib_truth = df_cal$y,
  dist = "gamma"
)

Run the code above in your browser using DataLab